E-mail deliverability

Messages to hotmail.com addresses started bouncing. A user on my server e-mailed from his outlook.com address to say he was having problems and I found I couldn’t reply to this either.

Microsoft use an extremely blunt instrument of blacklisting IP addresses and entire subnets. This makes some sense. If a server is routinely blasting crap at your service, simply blocking the IP early in the process avoids having to process the junk with complex filters. The downside is that if your server gets blacklisted you may not be able to do anything about it.

I’d long since implemented SPF but having been blocked by Microsoft I finally got round to putting DKIM in place and adding a DMARC policy.

Having set all this up, and confident I was complying with the Outlook.com policies I requested mitigation for my server. No joy. My server “does not qualify for mitigation” although they won’t tell me why.

My server handles a few personal domains, one for a local charity and sends very little e-mail. I suspect the problem was caused by a user sending to a distribution list of around 60 addresses and one of those recipients clicked the junk button.

The trouble is I’m guessing. I don’t know why Microsoft blocked my server, they won’t tell me, and the tools they provide to help manage this don’t work if you send less than 100 e-mails a day.

Short of relaying mail through an expensive “deliverability” service, about the only thing I can do is change my IP. This involves changing enough DNS records that I really don’t want to make a habit of it.

So no conclusions or solutions to this problem. If you’ve encountered it, know you’re not alone.

Aruba AP-387 point to point

The Aruba AP-387 was launched a little while ago now, I first saw it demonstrated at Aruba Atmosphere 2018 in Croatia. It’s an AP designed for point to point links with 802.11ac 5GHz and 802.11AD 60GHz radios. The aggregate RF throughput is in the region of 2.5Gbps which means it can maintain duplex gigabit wired speeds and testing has shown this to be reliable up to the specified 400 metres. Should conditions cause a deterioration of the 60GHz link, the 5GHz link will likely continue to provide connectivity, albeit with lower throughput.

Installation is relatively straight forward. The APs don’t need super accurate aiming as part of the 802.11ad spec includes electronic alignment of the antenna. This can also cope with wind motion, though the more stable and well aimed the units, the better.

This install was between a sports facility building and a remote cabin beside the cycle track. The cabin had been without connectivity since construction as, for reasons nobody can explain, no data ducting was installed alongside the power feed. Attempts to drag fibre through the power duct failed and costs for ducting and fibre installation were priced at around £25k.

The AP-387 is a unified AP so a pair can be standalone, managed as Instant APs, or they can work as part of an Aruba controller environment – as was the case here. The link uses Aruba’s mesh configuration with one mesh portal and one mesh point.

This link was configured to use UNII-3 Band C channels on the 5GHz radio as the institution had an ofcom license for outdoor use. (note these channels are now available unlicensed for indoor use only at low power as well as outdoor at high power with a license…. not that it gets confusing at all.)

The initial setup on the bench was very straight forward. The installation was handed over to experienced cabling contractors with no specific wireless expertise or specialist equipment.

And it just worked.

The AP behaves as a bridge by default, passing all tagged vlans across the link. This network uses the same management vlan for APs and switches, so the only deviation from standard edge switch config at the remote end was to untag this management vlan on the uplink port.

The link length was approx 190 metres and I kept an eye on it during some quite mixed weather using a regular automated speed test. No performance drop was observed.

This was a great result. The cost, including installation, was a little over 10% of the cabling price estimate.

Two points to note. The mounting brackets really require pole mounting as there is no horizontal adjustment available. Once in operation there’s very little information available about the state of the 60GHz link.

Mesh Portal
Mesh Portal
View from Mesh Portal
View from Mesh Point
Mesh Point
Througput test

Can surveys be accurate?

A little while ago I read some fairly barbed comments from someone about the pointlessness and futility of using an Ekahau Sidekick for wireless surveys.

The argument went something along the lines of: because the Sidekick’s Wi-Fi interfaces and antennas are not the same as the devices actually using the network, the reported results are meaningless. The only way to survey realistically is to use the device the network is designed for.

These ideas weren’t really presented as part of a discussion, more a proclamation that anyone carrying out surveys using a Sidekick is producing misleading results. It’s quite the claim but the logic is hard to argue with, so does this position have any merit?

My immediate reaction, based mainly on my own experience, was “kinda, but not really”.

It’s true a network that looks good to the Sidekick can be problematic for a client like an iPhone 5 and this is entirely down to the high quality of the Sidekick antennas, especially relative to the design compromised antenna found in a smartphone.

When analysing survey results in Ekahau Pro an offset can be added to compensate for this. Working on a university campus I’ve always used -10dB as this fits with the previously mentioned iPhone – the most common client.

What’s more, because Wi-Fi chipsets are not calibrated there can be significant variation between devices of the same type. Three iPhone 6 handsets will likely give you three different received signal levels.

So how do you know whether the client you’re using to carry out a representative test of the network is good, average or a poorly performing example? You can take multiple devices and take an average, or take the worst performing example and use that… but you still don’t know whether there’s another one that’s even worse.

In other words how do you apply any rigor to your surveying if nothing is accurate and devices vary? But it gets worse.

If you haven’t seen this already, take a look at this post by the brilliant WifiNigel. In a nutshell Nigel has demonstrated (with a nice little rig and measurements and everything) just how much the orientation of a device changes the received signal strength.

What Nigel’s work demonstrates is just how important it is to get your device offset right. If the network is design for a voip client, it’s important to test that device while it’s just off vertical, gripped in a hand and held against a, presumably, human ear… not sitting horizontally on a desk at hip level…

So I still don’t agree with the assertion using a Sidekick doesn’t provide useful data. Whilst the Sidekick is not calibrated with accuracy any RF lab would find acceptable, they are tuned to a reference point so ought to be more reliable than the network clients.

It is key to know what is a realistic device offset to use and as far as possible that needs to be based on devices in use, not sitting on a desk in a different orientation.

Access Points – like lightbulbs

A common problem when deploying wireless networks is the challenge of where to physically place the Access Points. APs are, often, not the most attractive devices – they’re also getting larger in the age of 802.11ax with eight antennas, or more!

Personally I don’t mind the look of most APs but when aesthetic concerns raise their head, it’s hard to help people understand that where the AP goes really does matter.

I’m currently working on a project that involves replacing APs nobody liked the look of so they were stuck above the ceiling tiles. The metal ceiling tiles. Basically hiding them behind a microwave door. Things work surprisingly well, considering.

The wireless nature of Wi-Fi leads people to believe it’s fine to hide APs out of sight, or place them in the most convenient location rather than the most effective.

There are ways of dealing with aesthetic concerns. Oberon is one company offering a wide variety of mounting options including ceiling tiles that recess the AP to make things more visually pleasing.

External antennas also offer the option of hiding that terribly ugly AP whilst ensuring the antenna is in the best location to serve clients.

The problem many wireless network engineers face is how to challenge the status quo. If currently the AP is shoved on top of a metal ceiling tile, facing the wrong way, it can be hard to argue the case for doing it properly.

My approach, and one that I’ve found to be reasonably well received, is to base my argument on manufacturer recommendations. It isn’t me saying this, it’s the hardware maker. I have pointed out in meetings that if the aircon is supposed to be installed horizontally on the ceiling, in the centre of the room, you’re unlikely to decide to put it vertically on one wall.

I also tend to compare APs with another device that emits radiation in the electromagnetic spectrum – lights.

Light fittings radiate with a specific pattern. The office I’m sitting in right now has LED ceiling tile panels. These throw light down over a large area – but crucially not much light goes out the other side. You wouldn’t put these upside down and expect to have a well lit office.

APs should be viewed the same way. The antenna pattern of any access point is part of the network design. To compromise this is to compromise the design. Exactly how you get this message across is part of the many soft skills required of this role.

Most important, least capable

An important principle when working on wireless network design is the most important but least capable device. The best example I’ve come across is a handheld barcode scanner used in a supermarket that has particularly poor RF performance and works only on the 2.4GHz band.

For this device to work well there needs to be what looks like a bad RF design with too many APs on the same channel, likely all running at reasonably high power levels. In fact a network that works well for this scanner might perform poorly for other uses, especially anything requiring a lot of throughput.

But it’s all about the requirements; in this case the requirements of a device that’s operationally important to the site and that is, in the parlance of ham radio, a bit deaf.

Another example I encountered on our campus was a student’s laptop. In this case a modern machine with an 802.11ac (wifi 5) network interface. The student had recently moved into the room and they were experiencing issues with the network.

Checking the RF using a Fluke Netscout Netally Aircheck showed the 5GHz signal strength from the nearest AP was -67dB, right on the design spec money. However this user’s laptop reported a received signal strength (RSSI) of -77dB. My iPhone 6 “Real-World-Test-O-Meter” reported -70dB, so who’s right? The answer is… all three.

As a general rule tablets outperform phones and laptops tend to be the best – it’s all about how much room there is for the antenna – after that it’s about how much money is thrown at clever design. Normally my iPhone 6 is a good real-world, worst-case scenario test because it’s fairly old now and the antenna designs have got better. However in this case we can see the laptop is really quite poor. From the infrastructure side it’s possible to see the strength of the signals received by the AP from the client – APs really ought to have the best RF performance of all – but this can still be a useful indicator. A reasonably safe working assumption is if the AP is reporting a low RSSI from the client, the client is probably picking up even less from the AP.

Because Wi-Fi equipment is not calibrated (nominally identical devices will report different signal strength) whilst it’s fine to say the design is for -67dBm minimum across the service area, the question has to be asked: “As measured by what?

The general rule I’ve come up with for the campus environment I support is to assume the least capable device likely to be seen on our network will ‘hear’ the signal at 10dB lower than the measuring equipment we use, either the Aircheck G2 or Ekahau Sidekick. This isn’t an exact science – I can’t ask all the students’ personal devices to report back their RSSI – more’s the pity.

It just so happens, in this case, it works out perfectly. There was a 10dB difference and resolving that fixed the student’s problems.

Sometimes it will be obvious which your least capable devices are, and which of those are the most important. Sometimes it isn’t and you’ll just discover them along the way. The most important thing to remember is it’s perfectly possible to design a network that performs brilliantly for device A and really badly for device B so if device B matters to you, make sure you calibrate your design parameters accordingly.

Network resilience

Every so often we experience a network outage because a piece of equipment fails. One switch we use across our campus has a power supply failure mode that trips the power, so one bad switch takes out everything. However, most of time time I’m impressed at just how resilient and reliable the kit is. Network switches in dirty, hot environments run reliably for years. In one case we had a switch with long since failed fans, in a room that used to reach 40°C. It finally fell over one hot summer’s day when the temperature hit 43°C. Even then it was ok once it had cooled down.

Just a bit damp

Most recently there was a water leak in a building. I say leak, a pressure vessel burst so mains pressure hot water poured through two floors of the building for a couple of hours.

Let’s not reflect on the building design that places active network equipment and the building power distribution boards next to the questionable plumbing but instead consider the life of this poor AP-105.

Happily serving clients for the past seven or eight years, it was time for a shower. It died. Not surprising. What’s perhaps more surprising is once dried out the AP functioned perfectly well.

This isn’t the first time water damage has been a problem for us. Investigating a user complaint with a colleague once we found a switch subject to such a long term water leak it had limescale deposits across the front, the pins in the sockets had corroded. It was in a sad way but even though the cabinet resembled Mother Shipton’s cave, the switch was still online.

I have seen network equipment from Cisco, HP, Aruba, Ubiquiti, Extreme, all subject to quite serious abuse in conditions that are far outside the environmental specifications.

This isn’t to suggest we should be cavalier in our attitude towards deployment conditions – rather to celebrate the level of quality and reliability that’s achieved in the design and manufacturing of the equipment we use.

Farewell sweet AP-93H

Towards the end of 2018 we marked the final Aruba AP-125 being decommissioned. A venerable workhorse of an AP, these units provided excellent, reliable service for a really long time. Now it’s the turn of another stalwart of our network estate – the AP-93H.

Aruba AP-93H

The H apparently stands for “hospitality”, or so I’m told… I’ve never checked, and these APs fit over the top of a network socket. They have been invaluable to us.

Aruba are not alone in making devices in this format. Cisco, Ubiquiti and others do the same, but in each case they solve a really big problem. Namely we didn’t put enough wires in.

We’re replacing the AP-93H with Aruba’s current model the AP-303H but it isn’t just bedrooms that get the hospitality treatment.

I’ve written before about our challenges with asbestos, but also the need to have an agile and response approach to fast changing network requirements. The hospitality units are a fantastic way to expand network capacity where there’s no available ceiling level socket for one of our usual models, or maybe we’re not allowed to touch the ceiling anyway.

Stick an AP-303H over the top of a double socket and you can have Wi-Fi and four sockets available. Three of those network sockets can either tunnel that traffic back to the mobility controller or bridge locally to the switch – it’s up to you.

The AP-93H has, like most of the other hardware we’ve replaced, served us well. They’re end of life and not supported past Aruba OS 8.2 and so they have had to be retired. They’re also single radio APs, though technically dual band you can have either 2.4GHz or 5GHz, not both. So we welcome the upgraded version, perhaps wish the mounting plates were the same, and carry on with the upgrades.

Aruba OS8 cluster upgrade

Far from the first to share my thoughts about such things, I saw a live demo of an Aruba OS8 cluster being upgraded at the company’s Atmosphere event a couple of years ago. Controllers that were serving the conference were upgraded live, while we all checked our devices to confirm that we were still online.

The live cluster upgrade is probably one of the biggest headline features of AOS8. There are others I particularly like, but that’s for another time. The process works best if you have a reasonable amount of cell overlap in your design.

First all the clients and APs are moved away from one controller in the cluster, and this is upgraded. Once it comes online and syncs up it becomes a cluster master, in a cluster of one. Then a group of APs (AOS calls this a partition) are selected. The aim is that none of the APs selected will be neighbours of each other. The new firmware is pre-loaded onto the partition of APs. Next AOS encourages clients to leave these APs using all the tools of clientmatch it can before the APs reboot. The aim is clients will be able to associate with a neighboring AP.

Once the upgraded APs come back up they’ll join the upgraded controller and so the process rumbles on. If you have multiple controllers at some point AOS will upgrade other controllers and expand the cluster.

For always on networks like a university campus, hospital or airport, this is a great step forward as it allows much more regular code upgrades. A good thing.

However, and there always has to be a downside doesn’t there, it doesn’t always go quite as expected.

I performed a cluster upgrade from 8.3 to 8.4 and it took a long time. In fact it took about 17 hours to upgrade 2500 APs and four controllers. APs can get into a state where the pre-load doesn’t work. Rebooting the AP will fix this but AOS doesn’t do that. Instead it retries the pre-load five times at 10 minute intervals. This results in the partition of APs taking almost an hour. If you have one AP in each partition that doesn’t behave the entire process drags out for a really long time. Aruba have acknowledged this is an issue and I expect eventually there’ll be a fix or workaround.

So you have a choice – you can do a one hit reboot of all the controllers into new firmware, just as we always did, or you can do a cluster upgrade. One is easier to communicate to people, the other is likely to require communication…. it all depends on your network design. If you’re confident of cell overlap being really optimal, it perhaps doesn’t matter how long the upgrade takes.

Wi-Fi Design in Higher Ed

We all know the first stage in Wi-Fi design is to gather the requirements. But what if you can’t? Or you know full well that whatever requirements are outlined, they’ll probably change tomorrow. What if gathering all the potential stakeholders together in order to work out what on earth they actually all want is impossible? What if you have to support pretty much any client that’s been on the market in the last 5-10 years? Welcome to designing Wi-Fi for Higher Ed.

BYOD has been a thing for some time now, as people expect to use their personal mobile device on the company WiFi but even among corporates that have embraced this, there’s usually a high level of control that underpins assumptions about the network. Employees are often issued laptops/phones that are managed with group policy or some form of Mobile Device Management (MDM). IT can push out drivers and can usually decide what hardware is supported. For genuine BYOD there’s usually a policy to determine what devices are supported and that list is reasonably well controlled, limited to business need.

In the HE sector that doesn’t apply. Yes we have managed laptops, which are known hardware running a centrally controlled build, but the majority of devices on our network are a student’s personal laptop or phone. We provide services to academic departments, but if they decide they’re going to buy whatever they like and then expect us to make it work…. that’s pretty much what we’re there for.

Then there’s the question of what users are going to be doing… and we don’t know. Users and research groups move around. Recently A team with a habit of running complex, high bandwidth SQL queries over the WiFi moved out of their office where the network met their needs to a different building where it didn’t and the first I knew about it was complaints the Wi-Fi was down (it wasn’t down, it was just busy but more on that another time)

Yes there are some communication and policy problems where improvements could be made, but the key to desigining a network well for HE is to be flexible and as far as possible do what you can to make things futureproof.

“Hahahahaha…. Futureproof” I hear you guffaw. Indeed, what this means practically is making sure we have enough wires installed for future expansion. Our spec is for any AP location to have a double data socket, and we put in more locations than we intend to use, precisely to allow flexibility. This can be a hard sell when the budget is being squeezed, but it has paid off many times, and is worth fighting for.

A key focus is the student experience. We prioritise delivering a good service to study bedrooms – something that has required wholesale redeployment of WiFi to some buildings.

And so, dear reader, you’ll realise that we do have some requirements defined. Experience and network monitoring tells us we have a high proportion of Apple iOS devices on the network – so coverage is based on what we know helps iPhones roam well. We know how much bandwidth our users typically use – it’s surprisingly little but we have to support netflix in the study bedrooms.

We use DFS channels across our campus, we use a four channel plan on 2.4GHz – both bad news according to some people, but it works.

Perhaps the most important aspect of providing Wi-Fi for the weird combination of enterprise, domestic, education and unpredictable research demands that Higher Ed brings is to make sure you can do it. The second you tell that professor of robotics they can’t connect to your Wi-Fi, a dozen rogue networks will pop up. Agile, flexible, on demand network design is hard work but it’s easier than firefighting the wall of interference from that swarm of robots…. or is that a different movie I’m thinking of?

The state of 802.11r

802.11r or BSS Fast-Transition is a way of significantly increasing the speed of roaming in enterprise WiFi environments. More specifically it’s a standard that describes how to avoid having to perform a full 802.1X authentication when roaming to a new AP.

For the vast majority of clients Fast-Transition (FT) is no big deal. If a user is idly browsing the web, or their client is performing some background sync tasks, a roam from one AP to another isn’t something they notice.

Where it matters is when there are voip clients, or any application that doesn’t tolerate packet loss or high latency.

The 802.1X authentication is quite a chatty affair, and it takes a bit of time. The hallowed figure often quoted for voip clients is to keep latency under 150ms. The process of roaming to a new AP and then performing 802.1X can take longer than this.

With FT, the network authentication part of the roam is reduced dramatically. However, it does need to be supported by the client and network.

Like many standards, there were some elements open to interpretation. Some network vendors required a client to support 802.11r in order to connect to the network, others did not. But with our Aruba network it was necessary for all clients to support 802.11r before you could switch it on.

This has changed over time, my understanding is most vendors have moved to a position of allow clients that do and do not support FT to co-exist on the network. Apple’s MacOS does not support 802.11r (at the time of writing) but happily coexists with it.

Windows 10 includes support, but this seems to be dependent on the WiFi chipset and driver. Which brings me to the blunt message of this post.

In an Aruba OS8 environment, with a jumbled mix of clients, it’s not yet possible to enable 802.11r. I’ve just tried it and run into a couple of Windows 10 laptops that were no longer able to connect to the network. The symptoms observed were the EAP transaction with RADIUS timing out. The user experience was, of course, “the wifi is down”.