Ekahau Connect

One of the tools that’s made the most difference to my work with WiFi has to be Ekahau Site Survey (now known as Ekahau Pro) and it’s now better than ever. I’m just going to go straight for the exciting part. Ekahau Connect lets you plug your Ekahau Sidekick into your iPad for surveying and, yes, that is as lightweight and functionally glorious as it sounds. But there’s more…

Ekahau have turned what was one application, Ekahau Site Survey, into a suite that form Ekahau Connect. There’s Ekahau Pro – the Windows/Mac application that many WiFi professionals know and love. Ekahau Capture – Packet capture utility for the Sidekick. Ekahau Cloud – a cloud sync service, and Ekahau Survey – an iPad app used for surveying with the Sidekick.

To get the advantage of all this new goodness it’s pretty clear that you need a Sidekick. I found the Sidekick to be a worthwhile investment from the get go, but now I can connect my Sidekick to my iPad, it’s become something of a must have.

For surveying, for me, it’s transformative. The Sidekick with an iPad combination is lightweight with long battery life and much easier to operate on the move. Pan and zoom around the floor plan is so much smoother and easier on the iPad, and that really matters when you’re on your feet and also having to negotiate obstacles in your path.

I’ve been using ESS for the last few years and have always struggled to come up with a really satisfactory workflow for surveys. In part that’s because I’m often dealing with small academic offices (the offices are small, not the academics) which are not always easy to move around, and the doors all have aggressive auto-closers that try to eat my laptop. In short, I’m usually fighting piles of paper, books and doors, all while ensuring I’m being accurate with my location clicking on the floor plan. Even the lightest weight laptop starts to feel heavy after a while. I’ve been using a Lenovo Yoga, for the fold over touchscreen design and whilst it’s easier to carry around, it’s actually fairly hard work to operate because Windows and touch have never really gelled.

On the iPad it’s a different story. For a little while I’ve been playing with the beta of Ekahau Survey as the team beat back the rough edges (there really weren’t very many) and took on board feedback from everyone giving it a spin.

Using an iPad I can survey more quickly, make fewer errors that I need to correct, and keep going for longer. It’s a real productivity boost.

The workflow is pretty straight forward. Create your project in Ekahau Pro then export the project either to Ekahau Cloud or to the internal storage of your Sidekick. The latter option being particularly useful if surveying for a site where you don’t have internet access for your own device. The Ekahau team have talked a lot about how they ensure data isn’t lost if there’s a crash or the battery dies, by saving data to the iPad, the cloud service and the Sidekick.

From the moment I got my Sidekick I’ve wondered how long it would be before there was a packet capture utility… and now it’s here. I didn’t have advance information, it was just an obvious use case. Wireless packet capture under Windows has always been a slightly tricky task, Ekahau Capture and Sidekick make it really easy and the dual radios mean you can get complete (non-scanning) captures on two channels at the same time.

I’ve briefly mentioned Ekahau Cloud, but it’s worth exploring a little bit because it makes sharing projects easy. This is a big help for teams. It also means it’s possible for a team to work on different floors of the same building, and sync all that data back to the same cloud project.

I don’t want to neglect Ekahau Pro in this big update as it’s had more than just a new name. Quite a lot has changed under the hood. The visualisations are improved and I believe there’s also been some work done on improving the prediction algorithms.

Bottom line is if you’re already using Ekahau tools, especially if you already have a Sidekick, you’ll want to spring for this new suite so it’s worth putting together a case for management or your accountant to consider.

Professional tools – Ekahau

As I started to take up the mantle of WiFi human for a university campus, it was mentioned that we had “the Ekahau laptop”. This turned out to be a woefully under powered old netbook with Ekahau Site Survey installed. Nobody knew how to use it. So I learned.

Fast forward a few years and I’m an Ekahau Certified Survey Engineer who’s designed and surveyed a lot of our campus using this tool.

Ekahau Site Survey is, as the name suggests, a survey tool. It’s also a WiFi design tool. I’ve used it extensively for both tasks and it’s probably one tool I’d struggle to do without.

One of the strengths of ESS it’s relative simplicity. At the most basic level you import a floor plan, set a scale and then you’re ready to use this for predictive design work, or a real world survey.

Surveying is a matter of walking around a building, while clicking on your location on the floor plan. There’s a technique to this of course, but it really is just a matter of walking the floors.

To use Ekahau Site Survey as a design tool you’ll need to draw on walls, doors, filing cabinets and other attenuation areas as appropriate. Then you can place APs on the plan and ESS will show you what coverage can be expected.

What should be obvious about both predictive and survey data is the pretty visualisations generated by ESS will show you exactly what you’ve told it. If you put junk data into your predictive model by saying all the walls are drywall with a 3dB attenuation, your design isn’t going to work very well when it turns out you have 10dB brick walls. So it’s important to have some idea of the building construction and, ideally, have taken measurements.

Likewise with the survey side, if you’re inaccurate with your location clicking or walk 100m between clicks and force ESS to average the data over too large an area, you’ll get a result that’s not as useful.

In short, you do need to know how to use the tool – just like anything.

A quick mention of WiFi adapters. ESS works by scanning the selected WiFi channels (all available in your regulatory domain by default) and recording information and received signal strength of the beacons transmitted from access points. It’s necessary to have a compatible WiFi adapter that can be placed in monitor mode. Low cost options are available. If you give ESS two or three adapters it will spread the channel scan across these, allowing data to be gathered more quickly. ESS will also use the built in WiFi of your laptop to ping an IP or perform speed tests against an iperf server.

I started with a single USB interface, which I used to bash on door frames, before upgrading to a quick release (lego glued to the laptop lid) USB hub with three interfaces connected. This made the laptop lid too heavy and it would fall backwards.

To counter these first world problems, but also to allow for other interesting functionality, Ekahau made the Sidekick. It’s a neat box containing two dual band WiFi radios (802.11ac), a very fast spectrum analyser, processing capability and storage, and a built in battery.

For surveying Sidekick isn’t a necessity but it makes life much easier. The data gathered is more complete, the laptop battery lasts longer and the spectrum analyser capability turns ESS into a powerful troubleshooting tool.

I don’t suppose we could…

Situated across the lake, next to a lane that borders some fields, is the outdoor lab site of an ecology project into moorland management. Fascinating in itself, the team tending a very strange allotment sized plot of land are recording data an processing e-mails while almost literally in the field.

The site is about 300m away from the nearest external WiFi AP in that part of campus and despite the distance the 2.4GHz band is surprisingly usable providing you stand in just the right place. Because it nearly works the suggestion was to try building a DIY antenna out of kitchen ware and a high power USB wireless nic of dubious legality.

I recommended against this and instead have been able to setup an Aruba AP-275 linked back to campus with a point to point wireless bridge.

The gear in question is a pair of Ubiquiti Networks Nanobeam AC, part of the company’s Airmax range of products. This is the first time I’ve used any Ubiquiti gear on campus but I’ve long been a fan of what this gear can achieve for a really modest outlay.

Ubiquiti gear gets a bad rap among wireless geeks. There’s some good reason for this. It’s pretty cheap and their Unifi managed WiFi offering has long lacked features that would really qualify it to be truly ‘Enterprise’. The Airmax gear is also inexpensive, built to a price and, frankly, it can look a bit flimsy. Next to the Aruba AP-270 series the Nanobeam looks almost comical in its lack of weather sealing. However, I put a pair of a previous generation Nanobridge M5 devices up, somewhere in the wilds of North Yorkshire several years ago, and never had to touch them since. Wireless ISPs like Beeline Broadband have been using affordable gear from Ubiquiti and Mikrotik for years to bring broadband to areas that otherwise end up with DSL speeds little faster than dialup.

I think one of the reasons this gear gets a bad name is the way it’s sometimes used. Ubiquiti make some quite high gain antennas and it’s very easy to significantly exceed the power levels permitted in a regulatory domain. I’ve come across badly installed, poorly aimed radios where the country has been set to whichever would let the installer turn the power up to a metaphorical 11 (but probably higher than that). Because the equipment is inexpensive and accessible this is probably not a great surprise. There have also been some firmware shockers too, but again bad practices have left radios running in the wild with critically vulnerable firmware.

The Airmax gear may not be engineered like our Aruba external APs but it’s affordable, functional, can certainly be reliable and I have to say it’s a joy to use with a really nice user interface. Ubiquiti also make a management server available called UNMS. It’s still in beta, but it does a good job of providing a single pane of glass for seeing the network status and managing your Airmax radios.

This very modest link distance of 280metres means the Nanobeams can achieve 256QAM to provide 150Mbps throughput with a 20MHz channel width. It may be a distance that WISP engineers would laugh at… but it’s been a useful problem solver and the hardware cost under £200.

IPv6 on the WiFi

Yes yes yes… We all know that we ran out of IPv4 addresses long ago and IPv6 has been around for 300 years and is the solution to all our problems.

But, we’re still not using it. So why not?

“IPv6 is haaard”

For people who are used to an IPv4 world, with 32 bit addresses you can remember like 178.79.163.251 and broadcast domains with a few hundred addresses, IPv6 can be a terrifying prospect with 128 bit addresses that look like 2a01:7e00::f03c:91ff:fe92:c52b, umpteen billion addresses per subnet, a whole new reliance on ICMP and… well, it’s just different.

I never fail to be slightly surprised how conservative and resistant to change some folk in IT can be. It’s just human nature of course, to keep things working and try to avoid too much rapid change, but still.

Dabbling in IPv6 can result in some odd behaviour as it will generally be used in preference to IPv4. So if you can resolve a IPv6 address but can’t route to it, things break. There are ways to address this and most web browsers do, but people have been caught out and that led to IT departments disabling IPv6 on all Windows systems, for example.

Truth be told, IPv6 isn’t all that hard, so why haven’t we deployed it on our WiFi?

She doesn’t have the capacity Jim!

The problem on our network is address table capacity. Previously I’ve talked about a problem we encountered of filling the arp table on the core routers. Keen not to be burned twice by the same problem, IPv6 presents new challenges.

Firstly there’s the obvious issue of each address being four times the size and therefore needing more memory. Then there’s the issue of just how many addresses you’re dealing with.

With IPv4 a client on our network requests an address via DHCP, is given one, and that’s the end of it.

With IPv6 clients come up with a link-local address (starting FE80:) which can be used to talk to other devices on the same subnet without any configuration being done at all. Then, using SLAAC (Stateless Address Autoconfiguration), the router advertises it’s address and by virtue of that the local subnet. Clients then come up with their own IPv6 address based on their hardware MAC address, which will (probably) be a globally routable address. But this address, built out of the MAC address of the client, can be used to track individual machines as they move around different networks. To avoid this most operating systems will also come up with a privacy address. This is another IPv6 address that’s valid in the subnet but is not based on the MAC address of the hardware. Because there are so many addresses in an IPv6 subnet, the chance of a conflict is… well, you don’t have to worry about it.

These privacy addresses are what the client uses to talk to the world, but the client will also respond on its SLAAC address. Privacy addresses are also changed periodically. When a privacy address changes, the client may hold on to the old one in case there’s any incoming traffic still trying to use it.

Long and short of all this is you have to ensure your network equipment can handle the number of IPv6 addresses required for the number of clients being supported.

In our testing so far we’ve seen an average 2.6 IPv6 addresses per client. We think our routers could just handle that for our regular concurrent client count but with no room for growth it’s asking for trouble.

It’s worth mentioning that whilst we really do want to provide globally routed IPv6 addresses to the WiFi clients, this isn’t something we actually need to do right now, but do expect it will be required in the foreseeable future. And we do have options available if we had to make this work right now, the easiest being to spread the subnets over a few routers so as to avoid the need to replace the core routers. We could also just buy some hardware with enough capacity to handle just the WiFi traffic routing.

This situation though presents an opportunity to look again at our whole network and maybe simplify some of it, using technologies such as VXLAN that weren’t available on the hardware we were using previously.

Suffice to say our WiFi is ready for IPv6, the firewall rules are built and tested, the radius accounting all works… we just need the rest of the network to catch up.

Wireless home automation

Some people are massively into home automation, with a motor and remote control fitted everywhere they possibly can be. I’ve largely not really seen the point of it. I live in a small house, the light switch is never far away. I’ve also found it baffling that in order to switch on the light I need an internet connection.

However, I did buy a internet connected heating control system opting for Hive by British Gas. The system is easy to install and designed to be a direct replacement for many UK domestic installations. The wireless side of the system uses zigbee and it consists of a boiler control, wireless thermostat and hub that connects to your network. Overall it’s worked well in allowing me to remote control the ancient heating system in my house, but it isn’t really very sophisticated.

The Hive thermostat is a bit…. basic. As far as I can tell it’s old school in that it runs the heating until the temperature measured by the thermostat reaches the target and then stops. The problem is the radiators are then hot, so the temperature in the room keeps rising and will significantly overshoot the target. Before the room temp has dropped to below the target it can start to feel a little cool, because we’ve just acclimatized to that higher temperature. The result is you turn up the heating and end up running the system at a higher temperature than is necessary to be comfortable.

This is how heating control systems have worked for years, but there’s a much better way. Secure (Horstmann) and a few others implement something called Time Proportional Integral (TPI). I don’t pretend to know how this works, but the result is the heating system runs for shorter bursts, switching a predetermined number of times per hour until the temperature is reached and reducing the overshoot that’s common with simple thermostat control.

We have recently got a place in Northern Ireland which we’re using as a base to visit family more frequently. The controls here use TPI but they’re otherwise a standard wired system and I want remote heating control so I can keep an eye on the temperature to make sure it isn’t getting too cold, and also it would be really useful to be able to turn the temperature up before we visit in winter.

Hive is out for the reasons above. Other systems such as Nest that do clever learning of your life patterns are useless in a building that’s not fully occupied all the time. So I’ve gone to rolling my own using Z-Wave controls and Home Assistant with the hass.io distribution on a raspberry pi.

Z-Wave is a really interesting wireless home automation protocol. Like zigbee it employs low bit rate, low energy RF so devices can be battery powered. It is a proprietary protocol though, unlike Zigbee, and whilst I tend to prefer open standards in the real world having a proprietary chipset in every device means it’s easier to get devices from different manufacturers to work together. I have a USB adapter in my raspberry pi so it can act as the Z-Wave controller but a particularly neat feature is devices can be directly paired.

This means I can setup my thermostat to directly control the boiler switch. The state of the two devices is also reported on the Z-Wave network. There’s a really big win with this. If I had a temperature probe in the room and my automation server had to turn on the heating based on a rule what happens if my automation server fails? No heating. Also, adjusting the heating becomes harder. The beauty of using Z-Wave controls, and directly pairing them, is I can have a normal looking thermostat on the wall, and this directly controls the heating. But then I can control the temperature set point of that thermostat remotely using Home Assistant. I can also override the boiler control should the thermostat fail (unlikely) or the batteries run out (more likely).

This gives me the same level of control I have with Hive, but the system isn’t reliant on the hub device being in the middle, I only need my thermostat and my boiler control to make it work. Then you can start getting into the smarter automation functions. Home Assistant can pull in calendar information using caldav and turn that into a switch. So I can automate whether the heating is set above frost protect based on whether there’s a booking in the house booking diary. Which means I don’t have to worry about a visitor using our house turning the heating up and leaving it.

Using Home Assistant with Z-Wave allows for other neat control options such as lighting, blinds, whatever, in order to make an empty home look less empty. But it also allows for those controls to always have a local activation so when a family member who doesn’t own a smartphone wants to use our house, that’s no problem. Essentially I can build up a home automation system to be as simple or sophisticated as I want, but never need to have an internet connection in order to turn on the bathroom light.

But for now, it’s just going to be the heating.

What? No ARP?

Here’s a quaint little problem that hit our network recently… We ran out of space on the ARP table of our core routers.

“What the hell?” I hear you say, “what is this, 2003?” and well you might ask.

Over that last couple of years we’ve been upgrading our campus network, which represents a very large number of switches, from primarily HP Procurve 2600 series edge with 5400zl doing the OSPF routing to primarily the comware range with HPE 5130 edge and 5900 doing the OSPF then 5930s at the core to replace the previous 5900 (setup in a terrible design that meant we could never upgrade them without bringing down everything). It’s a slow process because we try not to break things as we migrate them and, as I mentioned, it’s a lot of switches.

Whilst we have a lot of edge ports, our network isn’t really all that complicated, there’s just a lot of edge. So an IRF pair of 5930s as the core router in each datacentre seemed to be just fine.

Very recently we upgraded our Aruba mobility controllers and moved them off the 5400 switches that have been handling all our WiFi traffic and on to some of the new kit in our datacentres.

So far so good.

We then slowly moved the subnets from the 5400s to our 5930s. This work was well over 50% complete when one morning, just near the start of the university term, we seemed to have a problem.

Some WiFi users seemed to have no connectivity. We quickly established that plenty of traffic was flowing from the WiFI controllers and through our off site link. Whilst the problem seemed to be fairly serious, it wasn’t affecting hundreds of people, as far as we could tell.

Theories flew round the office as we tried to understand what was happening and why some users seemed to have a perfectly good WiFi connection and layer2 traffic was passing, but they couldn’t do anything useful…. There seemed to be no pattern, and a broken user would suddenly start working with nothing having changed.

It was spotted that we had 16,384 entries on the arp table of our 5930s and this was initially dismissed as a small number, but one of my brilliant colleagues pointed out that it was a rather neat, round number and that wasn’t likely to be a good thing.

So it turns out that all of the comware switches we’re using as routers, 5510, 5900 & 5930 have a max arp table size of 16,384.

As this term has kicked off we’ve seen higher numbers on our WiFi alone, and the 5930s are also routing for all the servers in our datacentres.

This was a pretty basic problem. We’d just filled up the tables and our routing switches were no longer doing the business.

This issue caught us out because, as previously mentioned, this is quite a small number. The 5930 will handle 288K mac addresses and a route table of over 100K. More significantly the decade old switches we were replacing could handle it.

Another reason this slipped past is the ARP table size doesn’t appear on the spec sheets of many switches.

We just assumed these very capable datacentre switches had the horsepower and memory allocation to do what we needed, and assuming made an ass of whoever.

Cisco fans will tell you they’ve long had the ability to allocate the finite amount of memory in a layer3 switch to the tables you need. Fortunately this functionality is now being made available to the 5930 (don’t know about the other models).

In our case this means we can reduce the routing table size (I don’t think we need 10K, never mind 100K+) and give more room for arp entries. We can then try again to move the WiFi subnets and, hopefully avoid problems.

The lesson from this has to be the importance of understanding the spec of a box you’re putting at the centre of a network. I can comment on the importance of assessing the implications of a change, such as moving the routing for 16K clients, but to be honest we’d still have assumed the big switches could handle this.

So, roll on the disruption of multiple reboots to bring our 5930s up to the software version that will do what we need, and in the meantime the venerable 5400zl continues to just work.

Finally, I should stress that our network is really very reliable. Serious outages used to be relatively common place and I can’t remember when we last experienced an unexpected widespread outage. This, again, is why we were caught out by this. The 5930s have just been rock solid and they spoiled us.

Audinate Dante, with lots of switches, and latency

I’m sure someone once said you shouldn’t use commas in a headline or title but they’re not the boss of me and I don’t care. So, to business…. Here’s one of the rare posts that isn’t about the WiFi.

As someone who’s done my fair share of live sound and studio audio work, I’m something of a gear head. One of the big developments in the area of digital audio has been networked audio and by far the biggest, and some would say only truly relevant player, is Australian company Audinate with their product Dante.

For those that don’t know, Dante is a networked audio system that allows multiple unicast or multicast streams of high quality, uncompressed, sample-accurate audio to be routed between devices across a network. It’s based around custom chips that are designed and sold by Audinate then used by pretty much every serious audio pro audio equipment manufacturer in the world.

Perhaps the best thing about Dante is that it’s just another network device. It doesn’t need anything special to work, and it will work on standard network switches. It’s perhaps no surprise that Dante has become really popular in AV installs. You can even get POE powered speakers, so rolling out a PA system through conference system corridors can all be done just with a network cable. It’s neat stuff.

But this post is less about Dante itself, and how awesome it is, but more how it can impact design decisions in modern networks.

For this example I’m going to use a large lecture hall – a flexible space that can seat 1500 people and is used for conferences, lectures, public events, etc.

A key consideration of a digital audio system is latency – generally referring to how long it takes audio to make it’s way through a system from, say, a microphone input to the speaker output. The old analogue audio system had effectively no latency. Electrons being inconvenienced along the various mic cables, through the mixer, amplifiers and out to the speakers all happens extremely quickly. In digital systems we have analogue to digital conversion, which takes a bit of time and we  invariably introduce buffers, which hold things up a bit and leads to data waiting around. Almost every part of a digital audio system is slower than analogue. Add all these delays together and the system latency can easily get too long to be acceptable.

The same can be said about networking. Every network device your frame or packet passes through will add a bit of latency.

Dante moves audio across an Ethernet network, and switched Ethernet is really very fast (worth noting that in a standard Dante network doesn’t route across subnets). Dante conceptually supports incredibly low latency settings, but most of the chipsets have a minimum latency setting of 1ms. This means anything receiving a stream from that device will allocate a 1ms buffer and will therefore handle network latency of up to 1ms without any disruption to the audio. If the latency exceeds 1ms then you’re in trouble.

A brief blip might be barely noticeable, but sustained high latency will result in late packets being dropped and particularly unpleasant sounding audio stream.

Having discussed latency, and network switches, let’s get to the core of the issue…

Recently installed Dante equipment wasn’t working very well in the lecture hall. The network switches are HPE 5130 with IRF stacking. There are three switch stacks in the building, connected back to an IRF pair of HPE 5900 switches (acting as a router) with 2 x 10Gb links from each stack in a LACP. Stack1 has two switches, stack 2 has five and stack3 has nine switches.

From an administrative perspective there are three switches in the building, though of course we have to remember there’s really far more… why? because latency, that’s why.

Our problem dante devices are situated in mobile racks, each containing their own switch – this allows them to be easily unplugged and moved around.

It turns out one mobile rack was connected to stack 2 and the other was into stack 3. With 1ms latency configured Audinate reckon Dante is good for 10 gigabit switches or about three 100Mb switches (because Fast Ethernet has higher latency) but we were having problems, so just how many switches was our traffic passing through?

The trouble is, I don’t know. Because our Comware switches are in IRF stacks how traffic moves around between members is inside the black box and not available for analysis. Our stacks are a ring topology and we have two uplinks, I don’t know which way around the ring the traffic is going or which of the two uplinks my it’s using.

So I have to assume a worst case scenario, even if this shouldn’t actually happen. Here goes: My first mobile rack switch is connected to stack2 member 1 and that stack hashes my traffic to the second uplink which is on member 9. For some reason rather than taking the shortest route my traffic makes it’s way the long way around the ring, therefore passing through a total of five switches.

It then heads to the routing switch, also an IRF pair, so let’s assume we pass through both of them and on towards stack3. Again we have to assume the worse case scenario, that the traffic passes through all nine switches.

The end result is that it’s possible the traffic could traverse 18 switches.

It’s hard to put some figures on this. According to the HPE 5130 datasheet the gigabit latency is <5us. What I don’t know is whether that changes as more complex configs are applied. 10Gb latency is an even lower <3us. However we don’t know how and even if the IRF affects this. These switches are blisteringly, mind bendingly fast. Even a much cheaper switch is really, really quick, and it’s why we can build switched networks capable of incredible performance. All hail the inventor of the network switch, who’s name I can’t be bothered to look up.

Even though that blisteringly fast performance x 18 still adds up, but not to much. If our Dante traffic follows the worst case scenario, however unlikely, we still only get to something like 200us of latency. This should be absolutely fine, but it wasn’t reliable. I suspect a big part of this is general network conditions. The more physical switches your Dante traffic passes through, the higher the chances it could face moments of congestion.

Back to reality then. Increasing the latency setting of the Dante device would remedy this at the cost of increasing the total audio system latency. If that’s acceptable, it’s the best choice because it provides a healthy margin. The other alternative is to change the network patching so we limit the number of switches. In this case we’ve been able to keep the critical devices connected to stack3 and the very worst case scenario is we’re down to 11 switches and not touching the uplinks but in fact most Dante devices are linked to the same switch stack member.

At the end of this long, confusing mess of writing then I’m left uncertain quite what was causing the issue. Our theoretical latency was well below the minimum required and yet we ran into issues. Probably the most important takeaway for a network engineer is that when working with something as latency sensitive and uncompromising as Dante, it’s important to consider the number of devices in the network path… you know, just like the deployment guide says.

Aruba Control Plane Security and the AP-203H

Here’s a useful little tidbit for anyone with an Aruba OS 6.5 environment wanting to enable control plane security with some AP-203H on the network.

With Aruba campus APs all traffic between users and the controller is encrypted (probably) by virtue of the WiFi encryption in use being between the client and the mobility controller, rather than the AP. So unless you’re using an open network, in which case I must confess I don’t know what happens and I presume that’s unencrypted on the wire as well, all your client traffic is encrypted until it pops out of the controller. Lovely.

Control plane traffic, the AP talking to the controller, is not. This isn’t usually a problem as we mostly trust the wires (whether we should or not is another matter).

In our environment all the user traffic is tunneled back to the controller in our data centre, which works very well especially considering our users are highly mobile around a university campus.

However it’s also possible to bridge an SSID so the AP drops the traffic on a local vlan. This is desirable in some circumstances, for example we have a robotics research group who need to connect devices to a local subnet. In order to enable this you first have to switch on control plane security.

Switching CPsec on causes all APs to reboot at least twice so on an existing deployment it leads to 8-15 minutes of downtime. I switched CPsec on this morning for our network with approximately 2500 APs, it was a tense time but went well…. mostly.

I’d read stories of APs with a bad trusted platform module being unable to share their certificate with the controller. We have a lot of APs from many years old to brand new and event a low percentage failure rate would present a problem.

In the end two APs failed, with lots of TPM initialization errors. However all our AP-203H units failed to come back up and the controller process logs started showing things like this:

Sep 4 07:57:37 nanny[1399]: <303022> <WARN> |AP <apname>@<ip_addr> nanny| Reboot Reason: AP rebooted Wed Dec 31 16:01:41 PST 1969; Unable to set up IPSec tunnel to saved lms, Error:RC_ERROR_CPSEC_CERT_REJECTED

It took a little while to become clear it was definitely just one model of AP affected and it was all of them.

A bit of time with Aruba TAC later and it transpires it’s necessary to execute the command: crypto-local pki allow-low-assurance-devices 

This needs to be run on each controller separately, and saved of course.

The command is detailed as allowing non-tpm devices to connect to the controller. I’m not entirely clear what’s special about the AP-203H, presumably it doesn’t have a TPM, yet it does have a factory installed certificate. We also have some AP-103H units, which don’t have a TPM but they work just fine with a switch-cert. I suspect this is a bug in the firmware and Aruba OS is treating the 203H as if it has a TPM but as it doesn’t everything falls apart.

Clearly if you want the maximum possible security, allowing low assurance devices is presumably going to raise eyebrows. In our case we’re happy with this limitation.

May this post find someone else who one day searches for RC_ERROR_CPSEC_CERT_REJECTED ap-203h 

Where’s the data?

On my quest to learn this week I had the privilege to attend Peter Mackenzie’s Certified Wireless Analysis Professional class. When a colleague and I attended Peter’s CWNA class a couple of years ago he suggested that CWAP should the be next port of call after passing the CWNA exam. Initially I thought that was mainly because he wrote the book (and an excellent book it is too) but actually CWAP goes deeper into the protocol and builds an even better foundation for an understanding of how WiFi works.

Most people who’ve dabbled in networking are familiar with Wireshark. It’s a fantastic tool, and a great way of troubleshooting problems. With packet capture you can prove that client isn’t getting an IP address because it isn’t doing DHCP properly if at all, rather than it being a wider network issue… for example.

Wired packet capture is often relatively easy. Mirror a port on a switch (hub out), put an actual honest to goodness old school hub in line (if you have one and can tolerate the lower bandwidth during capture), or if you have the resources get a fancy tap or dedicated capture device such as the Fluke Onetouch. Usually we have one of these methods available but for WiFi it’s not quite so easy.

Mac users have these fantastic tools available, and there are good options for Linux users but for Windows folk life can be tough and expensive.

WiFi drivers for Windows tend not to expose the necessary level of control to applications. So you’re left needing a wireless NIC with a specific chipset for which there’s a magic driver, or getting hold of an expensive dedicated capture NIC.

There is one option that I’ve played with in the form of Tarlogic’s Acrylic WiFi. This affordable software includes a ndis driver that interfaces with the WiFi NIC and presents monitor mode data to applications. Analysis within Acrylic is fairly poor but it will save pcap files and it’s possible to use the Acrylic driver directly in Wireshark.

The problem is that many new drivers don’t interface with ndis in Windows as things used to so as a result there’s a shrinking number of available WiFi NICs that still work.

Some time ago I bought an Asus AC53 USB NIC which is on the Acrylic supported list and it is possible to install the old windows8.1 drivers on windows 10. However this doesn’t support DFS channels. Rats.

Fear not though it is, just about, possible to make this work. The Netgear A6200 uses the same Broadcom chipset and supports DFS channels. Once installed I was able to select the netgear driver for the Asus device and sure enough it works.

Which is a long lead in to this little tidbit, a key take away from Peter’s course this week.

When looking at a wireless capture it’s important to remember you might not be seeing all the information. Physical headers are stripped off by the hardware of the WiFi interface, so you can’t see those. WiFi does certain things with physical headers only, such as Mu-Mimo channel sounding and aspects of this are therefore not visible to a protocol analyser.

It probably goes without saying that you can’t see encrypted data, but you can usually see that it’s been transmitted… Unless your wireless interface can’t decode it.

Let’s say, for example, your AP is using a 40MHz DFS channel and the capture setup you’re using can’t be configured to use 40MHz channels. In this scenario, because management and action frames are all transmitted on the primary 20MHz channel you can see these just fine, yet all the higher rate data frames that take advantage of the full 40MHz just disappear.

The result looks a bit like the picture here… A client sends RTS, the AP responds CTS and the next thing is a block acknowledgement but no data.

Often in this situation it’s possible to see the data transfer betrayed in the time between the packets but here, because the data rate is high and the traffic is light, it’s not particularly apparent.

Understanding what’s going on and what’s missing requires knowing how the protocol is supposed to work, and a bit of insight to how capturing takes place in a wireless NIC.

I’m particularly looking forward to spending more time digging into protocol analysis, and hopefully getting some better tools.