One Crazy Summer

Hey automators!

Summer’s been absolutely nuts. Between work stuff, family stuff, running here and there, and of course, the odd project or two, I’ve been just plain stretched for time.

Stay tuned. I’ll be coming back around shortly. I’m working on some things. Preview?

Well, Remember how Logitech decided that the Harmony Remote, one of the best things ever to happen to the world of universal remotes was going to be taken out back and killed? Yeah, I was pretty mad about that too. So, I went looking for something else to solve some automation challenges with that. So, that’s coming.

What else? Tried to buy a Raspberry Pi lately? Heh. Yeah, me too. I decided to try a different fruit for a change. So far, so good. More on that later.

More still? There’s an update on that printer situation. The dryer too.

How about a Raspberry Pi-based network console server for my network equipment?

Hang in there family, it’s coming.

It’s DNS. Again.

Oh, hello there…

Ok, so I opened Pandora’s box by starting to talk about DNS. I figure I should probably do a proper job of completely murdering the topic and kill it off for good. So – Unbound was the call. I don’t need to run an authoritative server at home any longer. Otherwise, I’d probably have stuck with BIND, honestly. I know it, it’s a pain at times, anything in it related to DNS over HTTPS or TLS is totally experimental, not ready for client-side, but the devil you know, right?

So, Unbound it is. I did a bit of a read-up, and between the Arch Linux wiki, the official docs, and a couple of random config snippets, and I had a config. As I mentioned in the other post, I used certbot to generate my DNS over TLS cert. I’m actually using the same cert for DNS over HTTPS, but the clients don’t really get to see that cert. Why? Well, these hosts also serve up other apps via https, so traefik is installed, and thus is bound to tcp/443, and I didn’t feel like messing with multiple IPs and binding different services to different IPs, so I just tied Unbound’s DNS-o-HTTPS to tcp/1443, and created a traefik service to front it with HTTPS, so it all works out in the end. Clients are none-the-wiser. Yes, there’s a little extra config, and this definitely flies in the face of my mantra of discarding technical debt. But what’s the greater debt – moving a port and making a reverse proxy entry, or setting up a whole new IP and playing around with all sorts of service bindings and ensuring that the right ports are bound the right IPs in the right places? Yeah, you’re seeing it now.

So, on to the config. It’s not the exact config, but it’s close enough. I’ve changed some bits to protect the sanctity of the innards of my home net. On my systems, I’m running Ubuntu Jammy (that’s 22.04 LTS), so I just installed the unbound package via apt, then dropped this in /etc/unbound/unbound.d/server.conf (the file doesn’t exist – you create it). There’s a handy syntax checker called unbound-checkconf, which helps you figure out where you’ve managed to fat-finger things in your config and mess it up. Ask me how I know how useful it is…

server:
    port: 53
    tls-port: 853
    https-port: 1443
    verbosity: 0
    num-threads: 2
    outgoing-range: 512
    num-queries-per-thread: 1024
    msg-cache-size: 32m
    interface: 0.0.0.0
    interface: 0.0.0.0@853
    interface: 0.0.0.0@1443
    rrset-cache-size: 64m
    cache-max-ttl: 86400
    infra-host-ttl: 60
    infra-lame-ttl: 120
    access-control: 127.0.0.0/8 allow
    access-control: 0.0.0.0/0 allow
    username: unbound
    directory: "/etc/unbound"
    use-syslog: yes
    hide-version: yes
    so-rcvbuf: 4m
    so-sndbuf: 4m
    do-ip4: yes
    do-ip6: no
    do-udp: yes
    do-tcp: yes
    log-queries: no
    log-servfail: no
    log-local-actions: no
    log-replies: no
    extended-statistics: yes
    statistics-cumulative: yes
    tls-service-key: /etc/letsencrypt/live/dns.home.somedomain.net/privkey.pem
    tls-service-pem: /etc/letsencrypt/live/dns.home.somedomain.net/cert.pem
    http-endpoint: "/dns-query"
    http-nodelay: yes
    private-address: 10.0.0.0/8
    private-address: 172.16.0.0/12
    private-address: 192.168.0.0/16
    private-address: 169.254.0.0/16
    private-domain: "home.somedomain.net"
    do-not-query-localhost: yes
    tls-cert-bundle: "/etc/ssl/certs/ca-certificates.crt"
    local-zone: "10.in-addr.arpa." transparent
    local-data: "1.10.10.10.in-addr.arpa.   600 IN PTR router.home.somedomain.net."
    local-data: "2.10.10.10.in-addr.arpa.   600 IN PTR switch.home.somedomain.net."
    local-data: "3.10.10.10.in-addr.arpa.   600 IN PTR ap1.home.somedomain.net."
    local-data: "4.10.10.10.in-addr.arpa.   600 IN PTR ap2.home.somedomain.net."
    local-data: "5.10.10.10.in-addr.arpa.   600 IN PTR ap3.home.somedomain.net."
    local-data: "6.10.10.10.in-addr.arpa.   600 IN PTR printer.home.somedomain.net."
    local-data: "10.10.10.10.in-addr.arpa.  600 IN PTR server1.home.somedomain.net."
    local-data: "20.10.10.10.in-addr.arpa.  600 IN PTR server2.home.somedomain.net."


remote-control:
    control-enable: yes
    control-port: 953
    control-use-cert: "yes"
    control-interface: 127.0.0.1
    server-key-file: "/etc/unbound/unbound_server.key"
    server-cert-file: "/etc/unbound/unbound_server.pem"
    control-key-file: "/etc/unbound/unbound_control.key"
    control-cert-file: "/etc/unbound/unbound_control.pem"

forward-zone:
    name: "."
    forward-tls-upstream: yes
    forward-addr: 1.1.1.1@853#cloudflare-dns.com
    forward-addr: 1.0.0.1@853#cloudflare-dns.com
    forward-addr: 9.9.9.9@853#dns.quad9.net
    forward-addr: 149.112.112.112@853#dns.quad9.net
    forward-addr: 8.8.8.8@853#dns.google
    forward-addr: 8.8.4.4@853#dns.google

So, You Should Dump IPsec, Right?

Wrong. Probably.

So, Since I just posted the other day about dumping my pile of Python scripts and IPsec VPNs and moving to Tailscale for my personal use case, several folks have sparked conversations with me about the topic.

In my case, it made complete sense to do something else. I was using a solution that was the essentially held together with bubblegum, duct tape, and baling wire. It was fragile, it kept breaking, and let’s be real – I was bending the solution into a shape it wasn’t designed to be used in – which is why it kept breaking in the first place.

You see, IPsec tunnels are intended to work when you’ve got stable, fixed endpoints. Over time, things have been done so that endpoints can become dynamic. But typically just one endpoint. Suddenly, with two dynamic endpoints, results become… Unpredictable. I think that’s a kind way of putting it even. That right there, explains my repeated breakage problems.

So, if you’re still using traditional firewall & VPN in a more traditional use case, then yes, keep doing things more traditionally – keep on using IPsec VPNs. It’s quite honestly the best tool in the bag for what you’re hoping to accomplish in terms of securing data that’s in motion, provided you’re able to meet the bars of entry in terms of hardware support as well as supported feature set.

So, get rid of your firewalls? Not a chance. Get rid of my SRX firewalls and EX switches? No way, no how. You can have my Junos stuff when you pry it from my cold, dead hands. Heck, I make my living with Junos. But just like the whole story of the guy who only has a hammer and thinks everything is a nail, sometimes you’ve just got to use a different tool to do the job right.

But taking the time to think about how to break up with complexity and technical debt? Yeah, that’s totally worth your time. Sometimes that means saying goodbye to old friends, even when you forced them into places where they didn’t quite fit.

So, in the end the whole square-peg-round-hole thing? Stop doing that.

Ditching Technical Debt. Embracing Simplicity.

I work in networking. I’ve been doing that for a long time now. Along that journey, I’ve also had occasional detours into worlds like generic IT and data security as well. I also do volunteer work at a nonprofit. Plus, like many of you who work in tech, there’s stuff that lives at the home(s) of relatives that you maintain because you’re that sort of person.

Sometimes, you do it cheap, sometimes you do it right, and sometimes you do it somewhere in-between. Like where you’ve got DHCP-assigned WAN interfaces everywhere because everywhere has home-user type Internet services, or less-expensive business-class occasionally. Anyhow, you can’t always count on having the same IP in the same place twice. BUT, you want things to be secured, and you don’t just want wide-open port forwards with plain old Dynamic DNS.

How things used to work, in the IPsec days…

You’ve got some Juniper SRX firewalls you’ve bought for lab work & study previously, you want to make use of them with IPsec VPNs, but to do it right, you really need static IPs. So, what do you do? You fake it. You just pretend you’ve got static IPs on the tunnel endpoints and configure it up. The tunnels come up, you post up your BGP sessions between your st0.0 IFLs, announce some routes, put some reasonable security policies in place. Yes, I did have security policies in there. I was born at night, but it wasn’t last night, guys. But how did I keep it working with IPs changing all the time?

Here’s how I was solving that problem up until fairly recently. I’ve been hacking away at my DNS-o-Matic and DNS Made Easy updaters for a while now. The DME updater was much better, IMHO, as it directly updated a single, private zone that only I ever cared about rather than rely on someone else to sit in the middle and do the updates for me. Plus, I wrote the whole thing from the ground up using DME’s API docs, so I knew exactly how it worked, inside & out. No excuses for it doing anything I didn’t understand, and honestly, I’m really happy with how well it’s been working. It’s been a great opportunity to get better at Python, in particular doing things in a more “Pythonic” way, rather than trying to “just get it done”, or worse, trying to make it work the way I used to do things in Perl or PHP years ago. Is it iconic? Not even close, but it does works pretty darn well.

So, with these containers all ran on Intel NUCs under Ubuntu Linux at each site. There were 1 more container on each of these NUCs as part of this operation. I had a set of Telegram Bots that talked to each other as part of this network to inform each other of site IP changes. So, if HOME changed its IP, the bot at HOME sent a message to the group that included the bots for NONPROFIT and INLAWS. Those bots saw note that the IP had changed and they should go find out the new IP of HOME, so they can update their tunnel endpoints. This in turn fired off a function that used the Junos PyEZ API module to update the IPsec tunnel endpoint IPs.

Did it all work? Yes, believe it or not, this actually all worked. Was it pretty fragile and not for the faint of heart? Oh yeah, for sure. Would I recommend doing it? Not a chance. So much so that I’m not even going to share the code, apart from the DDNS updaters. The other stuff is definitely hackjob territory. So, since it was so fragile and had the tendency to break, what did I do? Well, the first few times, I drove and fixed. Which frankly, sucked. After that, I installed an OpenVPN container at each of the locations. Later, I replaced those with linuxserver/wireguard containers. But, after it all broke like twice in about a month, I’d just about had enough. I cried Uncle and decided I was going to look for some other way to do this.

And that’s when my old pal Bhupen mentioned Tailscale to me. I was already into Wireguard. So making it easier, faster, and more useful were all on my short list. Drop the tailscale client on the NUC, get it logged in announcing the local subnet into the tailnet (their name for the VPN instance), making it a “subnet router”, approve the route announcement in the portal and it’s going. I’ve got control over key expiry too. Security policy (naturally) moved from the SRX down to the tailscale gateways, but their ACL language wasn’t too difficult to wrangle. It’s all JSON, so it’s reasonably straightforward.

The new Tailscale VPNs
The new Tailscale VPNs

So, with all the scripts gone and the IPsec stripped away, what’s it all look like? Well, we also added 1 more site into the mix as well – the in-laws vacation place. They bought a place and I stuck a Raspberry Pi up there for future IOT use. Not entirely sure about the “what” yet, but they just updated the HVAC, and it’s all smart stuff, so I expect there will be instrumentation. Maybe something that spits out time series info to Influxdb or somesuch. Who knows? Or Perhaps HomeKit/Homebridge stuff. Time will tell.

In the time since I made the diagrams and wrote this up, things have also changed slightly on the homefront.. I’ve deployed a 2nd subnet router at Home. In the Tailscale docs, they say all over the place not to deploy two subnet routers with the same IP space, and generally speaking, it’s with good reason – traffic destined for those prefixes announced by those routers will be round-robin’d back and forth between them. In my case, since they’re on the same physical subnet, this is essentially ECMP routing, so no big deal. I haven’t validated if they’re really getting the hashing correct, but haven’t really noticed any ill effects yet, so I haven’t shut off the 2nd subnet router yet.

So, by dropping all the BGP sessions, IPsec tunnels, Python scripts, Telegram bots, and Docker containers, things have become much simpler, and much more stable. I’m really happy with Tailscale. So much so that I ended up subscribing at the Personal Pro tier. Great bunch of folks – can’t help but recommend them.

UPDATE: This ended up sparking a bunch of sidebar conversations. Go read what I had to say as a followup