The clients you can’t control

You’ve just upgraded the network with the latest Wi-Fi6 APs – this promises to be faster, with lower latency and all round better for everyone and everything… great! But…. there are rumblings.

During your testing you found a number of the corporate laptops used an Intel Wi-Fi NIC and the driver had never been updated… these hit a well known bug that causes the driver to ignore Wi-Fi 6 enabled BSSIDs. No problem, because you did the testing that issue was found and a new driver was deployed.

Despite all your efforts, a number of helpdesk calls have come in from users complaining they can’t connect to the network any more. Some of them can’t even see the network…. Hmmm.

Turns out these machines haven’t had the new driver deployed by Group Policy because they’re not part of the domain. They’re BYOD, they have the same ancient driver and they won’t play ball with the 802.11ax network you’ve just deployed.

That’s not all. The old network didn’t have any of the roaming enhancements enabled and with all change it seemed the perfect opportunity to enable them: 802.11k/v/r all switched on.

Some of the misbehaving laptops can connect to the network, sometimes, but things are really unreliable. These also have an older Intel 7260 Wi-Fi chipset but updating the driver doesn’t help.

You’ve been struck by another Intel bug where the presence of the 802.11K Quiet information element upsets things and they break. This time it’s a hardware problem.

So do you switch off 802.11ax and 802.11k on any SSIDs used for BYOD or do you say “tough, your old stuff might not work any more”?

That, of course, is a policy matter.

When I encountered both these issues in a recent deployment, the decision was to take the path of least resistance and disable the functions. This means the network can’t benefit from the performance and capacity benefits offered by Wi-Fi6.

Not having any control over BYOD clients means they may end up dictating terms for the network. That might be fine, if it fits the policy, but in this scenario it was done because it was easiest, it made the problem go away. If that decision isn’t revisited later the network will always be operating below it’s potential.

AAA on the Wi-Fi

If you know any enterprise networking you’ll have come across AAA – Authentication, Authorization and Accounting. The cornerstone of network security that is ensuring the client can be authenticated, so not just anyone can connect – and often that first A is as far as it goes.

So what of Authorization?

This is what provides a way of making your Wi-Fi more efficient. If you have corporate devices, BYOD, IoT and they currently have three separate SSIDs (not uncommon) you can put all three onto the same SSID, reducing management traffic, and use the Authorization part of AAA to determine what network access each client should have.

This might use Active Directory group membership to determine which VLAN a user gets dropped into, or what ACLs are applied. In many systems both these are part of the Role Based Access Control is the term used for this.

Accounting is where you see what the user actually did. In practice this usually takes the form of when they connected to the network, how long for and how much data they transferred.

Here’s how it comes together in an example recent proof of concept for a customer:

Multiple departments are in a building and it’s necessary to provide security, keeping traffic from each dept separate. For regulatory purposes it’s necessary to assign network services costs to departments but this has to be based on real world information such as bandwidth use. Finally “we’re all one company so we don’t want to setup separate networks”.

Most networks have pretty much everything in place to do this, it’s just a question of whether all the dots are joined.

A RADIUS server (Aruba Clearpass in this case but something like Cisco ISE could be used) is already used for 802.1X authentication. Users from different AD groups can be assigned different roles, placing them in their department’s VLAN or simply applying ACLs specifying the access the client should have. The APs or controller are configured to return RADIUS accounting which allows the administrator to accurately determine the data traffic used for each connection.

Everything needed to do this has been around for quite a long time, but an awful lot of networks out there still have one SSID per VLAN, an SSID for each client type, and SSID for each day of the week. There are much better ways to do this.

Life as a creative

At the risk of sounding ‘arty-farty’ and woowoo it’s my opinion that to be human is to be creative and anytime we dismiss what we, or someone else does as “just technical” some essential human quality is being denied. Let me try and make some sense of that.

If you spend any time reading the tweeted thoughts of louder members of the Wi-Fi expert community, you’ll realise they don’t all get on. Sometimes this is the result of personality clashes or political differences, but often there’s strong disagreement about how one should implement Wi-Fi. So if two highly experienced engineers disagree and sling the mud over how Wi-Fi works, what chance do the rest of us mortals have of understanding it? Is the real problem we’ve fallen for the lie that working in an engineering role is not creative? Continue reading

Gigabit wireless point to point with 802.11ad WiGig

The Aruba AP-387 was launched a little while ago now, I first saw it demonstrated at Aruba Atmosphere 2018 in Croatia. It’s an AP designed for point to point links with 802.11ac 5GHz and 802.11ad 60GHz radios. The aggregate RF throughput is in the region of 2.5Gbps which means it can maintain duplex gigabit wired speeds and testing has shown this to be reliable up to the specified 400 metres. Should conditions cause a deterioration of the 60GHz link, the 5GHz link will continue to provide connectivity, albeit with lower throughput.

Installation is relatively straight forward. The APs don’t need super accurate aiming as part of the 802.11ad spec includes electronic alignment of the antenna. This can also cope with wind motion, though the more stable and well aimed the units, the better.

This install was between a sports facility building and a remote cabin beside the cycle track. The cabin had been without connectivity since construction as, for reasons nobody can explain, no data ducting was installed alongside the power feed. Attempts to drag fibre through the power duct failed and costs for ducting and fibre installation were priced at around £25k.

The AP-387 is what Aruba refer to as a unified AP so a pair can be standalone and managed as Instant APs or they can work as part of an Aruba controller environment – as was the case here. The link uses Aruba’s mesh configuration with one mesh portal and one mesh point.

This link was configured to use UNII-3 Band C channels on the 5GHz radio as the institution had an Ofcom license for outdoor use. (note these channels are now available for license free use indoors only at low power as well as outdoor at high power with a license…. not that gets confusing at all.)

The initial setup on the bench was very straight forward. The installation was handed over to experienced cabling contractors with no specific wireless expertise or specialist equipment.

And it just worked.

The AP behaves as a bridge by default, passing all tagged vlans across the link. This network uses the same management vlan for APs and switches, so the only deviation from standard edge switch config at the remote end was to untag this management vlan on the uplink port.

The link length was approx 190 metres and I kept an eye on it during some quite mixed weather using a regular automated speed test. No performance drop was observed during heavy rain or fog.

This was a great result. The cost, including installation, was a little over 10% of the cabling price estimate.

Two points to note. The mounting brackets really require pole mounting as there is no horizontal adjustment available. Once in operation there’s very little information available about the state of the 60GHz link.

Mesh Portal
Mesh Portal
View from Mesh Portal
View from Mesh Point
Mesh Point
Throughput test

Can surveys be accurate?

A little while ago I read some fairly barbed comments from someone about the pointlessness and futility of using an Ekahau Sidekick for wireless surveys.

The argument went something along the lines of: because the Sidekick’s Wi-Fi interfaces and antennas are not the same as the devices actually using the network, the reported results are meaningless. The only way to survey realistically is to use the device the network is designed for.

These ideas weren’t really presented as part of a discussion, more a proclamation that anyone carrying out surveys using a Sidekick is producing misleading results. It’s quite the claim but at first glance the logic is hard to argue with, so does this position have any merit?

My immediate reaction, based mainly on my own experience, was “not really”.

It’s true a network that looks good to the Sidekick can be problematic for a client like an iPhone 5 and this is entirely down to the high quality of the Sidekick antennas, especially relative to the design compromised antenna found in a smartphone.

When analysing survey results in Ekahau an offset can be added to compensate for this. Working on a university campus I’ve always used -10dB as this fits with the previously mentioned iPhone – the most common client.

What’s more, because Wi-Fi chipsets are not calibrated there can be significant variation between devices of the same type. Three iPhone 6 handsets will likely give you three different received signal levels.

So how do you know whether the client you’re using to carry out a representative test of the network is good, average or a poorly performing example? You can take multiple devices and take an average, or take the worst performing example and use that… but you still don’t know whether there’s another one that’s even worse.

In other words how do you apply any rigor to your surveying if nothing is accurate and devices vary? But it gets worse.

Take a look at this post by the brilliant WifiNigel. Nigel has demonstrated (with a nice little rig and measurements and everything) just how much the orientation of a device changes the received signal strength.

What Nigel’s work demonstrates is just how important it is to get your device offset right. If the network is design for a voip client, it’s important to test that device while it’s just off vertical, gripped in a hand and held against a, presumably, human ear… not sitting horizontally on a desk at hip level…

Whilst the Sidekick is not calibrated with accuracy any RF lab would find acceptable, they are tuned to a reference point so ought to be more reliable than the network clients.

It is key to know what is a realistic device offset to use and as far as possible that needs to be based on devices in use, not sitting on a desk in a different orientation.

Access Points – like light fittings

A common problem when deploying wireless networks is the challenge of where to physically place the Access Points. APs are often not the most attractive devices – they’re also getting larger in the age of 802.11ax with eight antennas, or more!

Personally I don’t mind the look of most APs but when aesthetic concerns raise their head, it’s hard to help people understand that where the AP goes really does matter.

I’m currently working on a project that involves replacing APs nobody liked the look of so they were stuck above the ceiling tiles. The metal ceiling tiles. Basically hiding them behind a microwave door. Things work surprisingly well, considering.

The wireless nature of Wi-Fi leads people to believe it’s fine to hide APs out of sight, or place them in the most convenient location rather than the most effective.

There are ways of dealing with aesthetic concerns. Oberon is one company offering a wide variety of mounting options including ceiling tiles that recess the AP to make things more visually pleasing.

External antennas also offer the option of hiding that terribly ugly AP whilst ensuring the antenna is in the best location to serve clients.

The problem many wireless network engineers face is how to challenge the status quo. If currently the AP is shoved on top of a metal ceiling tile, facing the wrong way, and things sort of work, it can be hard to argue the case for doing it properly.

My approach, and one that I’ve found to be reasonably well received and understood, is to base my argument on manufacturer recommendations. It isn’t me saying this, it’s the hardware maker. I have pointed out in meetings that if the aircon is supposed to be installed horizontally on the ceiling, in the centre of the room, you’re unlikely to decide to put it vertically on one wall.

I also tend to compare APs with another device that emits radiation in the electromagnetic spectrum – lights.

Light fittings radiate with a specific pattern. The office I’m sitting in right now has LED ceiling tile panels. These throw light down over a large area with, probably, a 120 degree beam pattern so crucially not much light goes out the other side into the ceiling void. You wouldn’t put these upside down and expect to have a well lit office.

APs should be viewed the same way. The antenna pattern of any access point is part of the network design. To compromise this is to compromise the design. Exactly how you get this message across is one of the soft skills required by the wireless network engineer.

Most important, least capable

An important principle when working on wireless network design is the most important but least capable device. The best example I’ve come across is a handheld barcode scanner used in a supermarket that has particularly poor RF performance and works only on the 2.4GHz band.

For this device to work well there needs to be what looks like a bad RF design with too many APs on the same channel, likely all running at reasonably high power levels. In fact a network that works well for this scanner might perform poorly for other uses, especially anything requiring a lot of throughput.

But it’s all about the requirements; in this case the requirements of a device that’s operationally important to the site and that is, in the parlance of ham radio, a bit deaf.

Another example I encountered on our campus was a student’s laptop. In this case a modern machine with an 802.11ac (Wi-Fi5) network interface. The student had recently moved into the room and they were experiencing issues with the network.

Checking the RF using a Fluke Netscout Netally Aircheck showed the 5GHz signal strength from the nearest AP was -67dB, right on the design spec money. However this user’s laptop reported a received signal strength (RSSI) of -77dB. My iPhone 6 “Real-World-Test-O-Meter” reported -70dB, so who’s right? The answer is… all three.

As a general rule tablets outperform phones and laptops tend to be the best – it’s all about how much room there is for the antenna – after that it’s about how much money is thrown at clever design. Normally my iPhone 6 is a good real-world, worst-case scenario test because it’s fairly old now and the antenna designs have got better. However in this case we can see the laptop is really quite poor. From the infrastructure side it’s possible to see the strength of the signals received by the AP from the client – APs really ought to have the best RF performance of all – but this can still be a useful indicator. A reasonably safe working assumption is if the AP is reporting a low RSSI from the client, the client is probably picking up even less from the AP.

Because Wi-Fi equipment is not calibrated (nominally identical devices will report different signal strength) whilst it’s fine to say the design is for -67dBm minimum across the service area, the question has to be asked: “As measured by what?

The general rule I’ve come up with for the campus environment I support is to assume the least capable device likely to be seen on our network will ‘hear’ the signal at 10dB lower than the measuring equipment we use, either the Aircheck G2 or Ekahau Sidekick. This isn’t an exact science – I can’t ask all the students’ personal devices to report back their RSSI – more’s the pity.

It just so happens, in this case, it works out perfectly. There was a 10dB difference and resolving that fixed the student’s problems.

Sometimes it will be obvious which your least capable devices are, and which of those are the most important. Sometimes it isn’t and you’ll just discover them along the way. The most important thing to remember is it’s perfectly possible to design a network that performs brilliantly for device A and really badly for device B so if device B matters to you, make sure you calibrate your design parameters accordingly.

Network resilience

Every so often we experience a network outage because a piece of equipment fails. One switch we use across our campus has a power supply failure mode that trips the power, so one bad switch takes out everything. However, most of time time I’m impressed at just how resilient and reliable the kit is. Network switches in dirty, hot environments run reliably for years. In one case we had a switch with long since failed fans, in a room that used to reach 40°C. It finally fell over one hot summer’s day when the temperature hit 43°C. Even then it was ok once it had cooled down.

Just a bit damp

Most recently there was a water leak in a building. I say leak, a pressure vessel burst so mains pressure hot water poured through two floors of the building for a couple of hours.

Let’s not reflect on the building design that places active network equipment and the building power distribution boards next to the questionable plumbing but instead consider the life of this poor AP-105.

Happily serving clients for the past seven or eight years, it was time for a shower. It died. Not surprising. What’s perhaps more surprising is once dried out the AP functioned perfectly well.

This isn’t the first time water damage has been a problem for us. Investigating a user complaint with a colleague once we found a switch subject to such a long term water leak it had limescale deposits across the front, the pins in the sockets had corroded. It was in a sad way but even though the cabinet resembled Mother Shipton’s cave, the switch was still online.

I have seen network equipment from Cisco, HP, Aruba, Ubiquiti, Extreme, all subject to quite serious abuse in conditions that are far outside the environmental specifications.

This isn’t to suggest we should be cavalier in our attitude towards deployment conditions – rather to celebrate the level of quality and reliability that’s achieved in the design and manufacturing of the equipment we use.

Farewell sweet AP-93H

Towards the end of 2018 we marked the final Aruba AP-125 being decommissioned. A venerable workhorse of an AP, these units provided excellent, reliable service for a really long time. Now it’s the turn of another stalwart of our network estate – the AP-93H.

Aruba AP-93H

The H apparently stands for “hospitality”, or so I’m told… I’ve never checked, and these APs fit over the top of a network socket. They have been invaluable to us.

Aruba are not alone in making devices in this format. Cisco, Ubiquiti and others do the same, but in each case they solve a really big problem. Namely we didn’t put enough wires in.

We’re replacing the AP-93H with Aruba’s current model the AP-303H but it isn’t just bedrooms that get the hospitality treatment.

I’ve written before about our challenges with asbestos, but also the need to have an agile and responsive approach to fast changing network requirements in academic and research environments. The hospitality units are a fantastic way to expand network capacity where there’s no available ceiling level socket for one of our usual models, or maybe we’re not allowed to touch the ceiling anyway.

Stick an AP-303H over the top of a double socket and you can have Wi-Fi and four sockets available. Three of those network sockets can either tunnel that traffic back to the mobility controller or bridge locally to the switch – it’s up to you.

The AP-93H has, like most of the other hardware we’ve replaced, served us well. They’re end of life and not supported past Aruba OS 8.2 and so they have had to be retired. Although they’re dual band, these APs are only single radio so you can have either 2.4GHz or 5GHz, not both. So we welcome the upgraded version, perhaps wish the mounting plates were the same, and carry on with the upgrades.

Aruba OS8 cluster upgrade

Far from the first to share my thoughts about such things, I saw a live demo of an Aruba OS8 cluster being upgraded at the company’s Atmosphere event a couple of years ago. Controllers that were serving the conference were upgraded live, while we all checked our devices to confirm that we were still online.

The live cluster upgrade is probably one of the biggest headline features of AOS8. There are others I particularly like, but that’s for another time. The process works best if you have a reasonable amount of cell overlap in your design.

First all the clients and APs are moved away from one controller in the cluster, and this is upgraded. Once it comes online and syncs up it becomes a cluster master, in a cluster of one. Then a group of APs (AOS calls this a partition) are selected. The aim is that none of the APs selected will be neighbours of each other. The new firmware is pre-loaded onto the partition of APs. Next AOS encourages clients to leave these APs using all the tools of clientmatch it can before the APs reboot. The aim is clients will be able to associate with a neighboring AP.

Once the upgraded APs come back up they’ll join the upgraded controller and so the process rumbles on. If you have multiple controllers at some point AOS will upgrade other controllers and expand the cluster.

For always on networks like a university campus, hospital or airport, this is a great step forward as it allows much more regular code upgrades. A good thing.

However, and there always has to be a downside doesn’t there, it doesn’t always go quite as expected.

I performed a cluster upgrade from 8.3 to 8.4 and it took a long time. In fact it took about 17 hours to upgrade 2500 APs and four controllers. APs can get into a state where the software pre-load doesn’t work. Rebooting the AP will fix this but AOS doesn’t do that. Instead it retries the pre-load five times at 10 minute intervals. This results in the partition of APs taking almost an hour. If you have one AP that doesn’t behave in each partition the entire process drags out for a really long time. Aruba have acknowledged this is an issue and I expect eventually there’ll be a fix or workaround.

So you have a choice – you can do a one hit reboot of all the controllers into new firmware, just as we always did, or you can do a cluster upgrade. One is easiy to communicate to people, the other is might not need any comms at all… it depends on your network design. If you’re confident of cell overlap being really optimal, it perhaps doesn’t matter how long the upgrade takes because your users will hardly notice.