Wi-Fi Design in Higher Ed

I was recently invited to speak at the London Wi-Fi Design Day hosted by Ekahau and Open Reality. It was a fantastic day, great to catch up with people in person and some excellent talks (if I say so myself).

You can watch all the talks, from this and previous years, on this playlist. Talks from London 2021 start at number 41 as the list is ordered today.

My talk, some thoughts on Wi-Fi Design in Higher Education can be watched below.

I was particularly challenged by Peter Mackenzie’s talk on troubleshooting and the idea we all tend to jump to answers before we’ve asked enough questions. Hugely helpful and highly recommended.

Trouble wi’ broadband

A decent domestic broadband circuit is pretty important for most of us and especially if, like me, you work from home some of the time. I’ve been fortunate that over the last few years things have been pretty good however that may be changing and the frustrating relationship between ISPs and transit providers operating in the UK makes for annoyingly difficult decisions.

Continue reading

ClearPass auth failure diagnostic

Here’s a “learn from my recent experience” type post

The problem: Clients are unable to authenticate from a new Wi-Fi network that has been added

Observations:

  • ClearPass appears to be working fine
  • Clients are successfully authenticating from the existing network using EAP-TLS
  • A Policy Manager service has been configured for the new network and incoming requests are correctly categorised
  • Authentication attempts from the new network are rejected, seen in Access Tracker
  • Failing auths are showing as an outer type of EAP, not EAP-TLS
  • No certificate content is shown in the computed attributes of the failed auths
  • Apple Mac clients are able to authenticate to the new network successfully, managed windows clients are not. The same clients work fine on the existing network.

The obvious conclusion is this the new network is incorrectly configured, and this turned out to be the case, but it what’s wrong… The last point in the observations was particularly interesting and threw a spanner at the “network config error” idea, because if the network config is wrong why can a Mac authenticate… is it the client? What’s the difference?

Connections from the new network were proxied via another RADIUS server. This is because the solution uses RADSEC and the new network Wi-Fi controllers don’t support RADSEC.

The information provided by Access Tracker appeared to show insufficient information for the auth to be successful. Crucially there was no client certificate information and the outer method showing as just EAP was… odd.

Looking at the logs for an auth showed an error early in the process:

rlm_eap: Identity does not match User-Name, setting from EAP Identity.

Ultimately here’s what the problem was… The proxy forwarding these authentications had a default to strip information from the username. Windows clients which presented the username as host\<hostname>.<domain> had this stripped back to just the hostname. So the TLS tunnel outer username presented to ClearPass became hostname$. The Mac clients didn’t present the FQDN as the username so nothing was stripped.

ClearPass performs a check on the Outer Identity of the TLS tunnel and the Inner Identity. If the outer identity is valid and the inner identity differs the auth will fail. In the case of EAP-TLS the error above will be displayed in the logs and the auth will fail.

The Outer and Inner identity either must match or the Outer Identity can be set to Anonymous.

Note this applies to EAP-PEAP and EAP-TLS. EAP-TTLS may also have issues with mismatches… basically make sure it all matches, don’t have anything strip data from the outer identity unless it’s being set to Anonymous.

The solution to this was to disable the domain stripping on the proxy.

For anyone who’s found this post after running into this issue, take heart that information presented by Access Tracker is not at all helpful in understanding why the auth has failed. It appears the certificate hasn’t been presented at all when in fact that data just isn’t presented to you.

The error message tells you exactly what’s wrong, once you understand how it works.

ClearPass, Intune and MAC randomisation

As more organisations have moved to Microsoft Azure AD and Intune to manage their devices a common request is how to integrate this with Aruba ClearPass, which handles the RADIUS requests for Network Access Control. The most common deployment pushes certificates to the clients which use these for EAP-TLS authentication with ClearPass. Note this post doesn’t cover the basic Intune integration setup which is documented in an Aruba guide.

But you probably want to know whether a bit more about the client than whether it has a valid certificate, so ClearPass has an Intune extension which will download information from an Azure tenant’s Intune to the Endpoints Repository database.

This allows you to make policy decisions based on Intune attributes, such as compliance state, allowing you to place clients in different roles/vlans depending on what the client is, whether it’s compliance with policy, department, etc etc.

However… the Endpoints Repository uses MAC addresses so problems start if clients are using MAC randomisation. The MAC address presented by the client won’t match what’s recorded by Intune so it won’t be possible to match against Intune attributes in the Endpoint.

The first thing to do is stop using the MAC address as the UID. Much better to use the Intune ID and have this written into the client certificate, either as the CN or a SAN though you can use any variable that ends up computed from the authentication.

You can then either query Intune directly using the Intune Extension HTTP method or use the Endpoints Repository in a slightly different way.

I prefer the latter option most of the time because it provides a level of resilience as well as improving performance. Querying the Intune API via the extension works very well but if Intune is down (which has been known to happen) that won’t work. It will also take longer than querying a local database.

ClearPass assumes you’re using the presented MAC address as the UID of the Endpoint and it isn’t possible to change this. Instead you can query Endpoints as an SQL database with a filter that pulls out the attributes you require based on the presented certificate CN.

ClearPass databases can be accessed externally using the username ‘appexternal’ and the password which is set in cluster-wide parameters under the Database tab.

Next create a new Generic SQL DB Authentication source pointing to the local tipsdb and set a filter that pulls out the attributes you want for the auth session variable presented.

Server name: <server IP>
Databse Name: tipsdb
Username: appexternal
Password: <password you set>
ODBC Driver: PostgreSQL
Password Type: Cleartext

A few things to note here. You can’t use localhost as the server name. If you have a VRRP address you can use this, otherwise you must use the actual IP of the server. This can cause complications in an environment with multiple ClearPass servers and no VRRP. There are ways around this but that’s for another blog.

The filter is the SQL query that pulls out the attributes you want based on an attribute presented. In this case we’re selecting ‘Intune User Principle Name’, ‘Intune Compliance State’ and ‘Intune Device Registration State’ for a record where the ‘Intune ID’ matches Subject-CN of the client certificate

select attributes->>'Intune User Principal Name' as "Intune User Principal Name",attributes->>'Intune Compliance State' as "Intune Compliance State",attributes->>'Intune Device Registration State' as "Intune Device Registration State" FROM tips_endpoints WHERE attributes->>'Intune ID' = LOWER('%{Certificate:Subject-CN}');

You then specify for each of these how you want the system to use the data it gets back, essentially either as an attribute or directly set as a role.

Add your custom authentication source into the service, and you’re good to go. You will now be able to make policy decisions based on the Intune ID lookup rather than the MAC address.

Three more things to note about this.

If you’re limited as to what can go into the certificate CN or SAN you can use the same method to pull out other details, for example with a query like this:

select attributes->>'Intune User Principal Name' as "Intune User Principal Name",attributes->>'Intune Compliance State' as "Intune Compliance State",attributes->>'Intune Device Registration State' as "Intune Device Registration State" FROM tips_endpoints WHERE attributes->>'Intune Device Name' = '%{Authentication:Full-Username}';

This method requires the device to be in downloaded to Endpoints by the Intune Extension. If that doesn’t happen it won’t be there to match on. In some circumstances Intune no longer records MAC addresses for devices – notably self-registered personal Android devices – and because the Endpoints Repository is based around MAC address these devices will be missing.

This is one of the reasons it’s worth using the Intune ID as your device’s UID in the certificate – if you need to query the Intune extension via HTTP you’ll need to present it with the Intune ID, nothing else will work. It’s worth noting devices will also have an Azure AD ID which looks similar, and will likely be the same for devices not managed by Azure AD, but the Intune API only understands the Intune ID when querying for device attributes.

Yet another Ser2Net tutorial

I often spend time away from home and want to be able to reach my home lab, both hardware and virtual. I use a Wireguard VPN, running within Home Assistant on a RPi 3 for the remote access, which means the network side is sorted. However I often find I’m juggling a few projects and might need to rebuild a hardware controller or an AP… that essentially needs console access. Whilst I could use a dedicated DC style console server they’re expensive, awkward, and overkill so I use Ser2Net on another RPi. There are plenty of tutorials on how to setup Ser2Net which are probably better than this one, but everything I found is based on an older version. Since then the config has changed to YAML and I found the defaults didn’t behave as I expected… so here we are.

Ser2Net “provides a way for a user to connect from a network connection to a serial port” – so says the project author Corey Minyard. You define a TTY interface and how you would like to connect to it. By default Ser2Net forwards raw data over TCP via a specified port to and from the TTY interface. Again, by default, you can only access a TTY over the network from localhost.

I threw my console server together using the latest version of Ubuntu for Raspberry Pi, Ubuntu 23.04 at the time of writing, running on a Pi 2b. Any old RPi is a good choice for the low power consumption and very light requirements.

Ubuntu 23.04 repositories contain ser2net version 4.3.11 which differs from previous version in that it uses YAML for the config. This is found in /etc/ser2net.yaml.

I’ve used three different types of USB Console – a couple of cheap FTDI cables from Amazon, USB console interface on an Aruba 7005 controller, and the Aruba TTL to serial cable for an AP. All were recognised by the OS.

How to set it up:

Build your Pi (or whatever machine you’re using) with Ubuntu 23.04 (or later)
Run sudo apt update && upgrade (just because we always should)
Install ser2net with apt install ser2net
Connect the USB serial interfaces and issue the command: sudo dmesg | grep ttyUSB
This will show you the USB to serial interfaces that have been recognised by the OS. It will look something like this:

You can now add these connections to the YAML file.

A quick note on security. Ser2Net doesn’t have any authentication. You can restrict the listener to localhost, as distinct from the host IP, and that means everything is protected by the strength of your ser2net host logon. I just want a telnet port forwarded to the TTY so it’s easy. This is not a good idea for any production environment without having other layers of security. In this case it’s a lab, and it’s only accessible either in person or via my VPN… so it’s good enough for me, but might not be for you.

The YAML for my console interfaces looks like this:

connection: &con0096
    accepter: telnet,192.168.26.3,2000
    enable: on
    options:
      banner: *banner
      kickolduser: true
      telnet-brk-on-sync: true
    connector: serialdev,
              /dev/ttyUSB0,
              9600n81,local
 

You need to increment the connection number, the port (2000 in this example) and the connector (/dev/ttyUSB0) in this example. If you have duplicates it won’t work properly though I believe you can use the same device in multiple connections to allow different port settings. The IP address of the accepter is where it listens for a connection. The default config has this set to tcp,localhost,xxxx which passes raw data over TCP (use something like nc) and is only available from the local machine. It probably goes without saying, but I’ll say it anyway, under the connector be sure to check the serial port settings are correct.

After changing the config file restart the service with sudo systemctl restart ser2net

I can now telnet to my ser2net host of 192.168.26.3 and depending on the port I get connected to a different machine. As shown in the screenshot above I have four interfaces connected and could use a USB hub or multi-port serial interface to access more machines.

When you should use WPA3 transition mode

Wi-Fi is backwards compatible so, if you really want to, you can connect that old HTC TyTN running Windows CE from 2006 to the latest Wi-Fi6E AP. There are good reasons not to support some of the oldest parts of the Wi-Fi standard if you don’t need to, so we tend to trim the lowest data rates supported and may choose not to use 2.4GHz for some SSIDs, for example.

We generally want our Wi-Fi networks to be secure however, so it’s a good idea to avoid using deprecated security such as WEP. Wired Equivalent Privacy turned out to be nothing of the sort and, once broken, was trivial to bypass. It should never be used, nor should WPA TKIP, the gaffer-taped fix for WEP.

WPA2 has been king for some years now, in fact it’s really quite old and it has limitations. It isn’t considered completely broken like WEP or WPA, but it has issues (which I won’t go into here) and so we get WPA3 as the latest offering for authentication and encryption.

It may seem obvious to switch to this latest and most secure option but that relies on your infrastructure and all clients supporting it.

This is where it gets tricky… because clients have a bad habit of sticking around. I recently worked with a customer who’s industrial and warehousing equipment didn’t support WPA3 at all, despite the latest hardware version being released in 2021. Even if your client hardware can support WPA3, do drivers need updating before this works properly… probably. Has this been done? Probably not.

WPA3 comes with a transition mode that allows for WPA2 clients to connect to the network. However at this point you’re essentially running WPA2 and subject to its drawbacks, at least for any clients that can’t support WPA3. What’s more because these clients work just fine it’s harder to form a business case to replace them or push updates up someone’s list of priorities.

It’s for this reason WPA3 transition mode is probably not a great idea on many occasions.

That said, I’m about to deploy it… and here’s why I think it’s the least bad option:

Nobody knows what the clients will support. There’s no coherent list of what clients exist on the network at all, and no time to gather this information. We have to assume that some clients won’t support WPA3 either at all or not without action. The desire is to use WPA3 as soon as possible but any disruption to clients is also problematic.

By using transition mode clients that can support WPA3 will do. Those that cannot can be audited as connecting with WPA2 and updated or replaced. Once clients are all using WPA3, or at some arbitrary deadline by the security team, transition mode can be switched off. Most clients will not see this change as a new network, so the disruption to WPA3 clients will be minimal.

Under ideal circumstances a new network would be deployed without transition mode and clients would like it or lump it… however life doesn’t work that way, and we really do need to transition to WPA3.

Video doorbell

Not really a huge fan of these things, but after missing a few deliveries it’s really a must. So I’ve got one – a Lorex 2k QHD Wired Video Doorbell.

Don’t let that word “Wired” fool you, this is a Wi-Fi device. It takes power from an existing doorbell transformer and runs on 16-24V AC.

It’s part of the Lorex Fusion Collection so there’s a network NVR and a range of cameras that work alongside it. I chose this as much for what it isn’t… it isn’t a Ring. It also ought to be easy to install, comes with a chimekit for linking an existing mechanical doorbell chime, doesn’t look too bad, records locally onto SD card and you can stream the video from it with accessible RTSP feeds:

Main stream: rtsp://ip/cam/realmonitor?channel=1&subtype=0
Sub stream: rtsp://ip/cam/realmonitor?channel=1&subtype=1

Despite taking care to check the Wi-Fi performance in the installation location I made a major error and didn’t test with the door closed. It seems my front door presents significant attenuation as does seemingly every wall in my house. This seems to be a particular feature of many new build houses, with the foil backed insulation and plasterboard. I also suspect the Wi-Fi radio/antenna in the doorbell isn’t great.

One update, post this video being completed, is the notification issue. If notifications are disabled for the device you will still be notified if someone presses the doorbell. This solves the notification fatigue issue I referenced, it just isn’t at all clear in the app this is how it works.

Big, fat, bloaty channels

Dip your toes into the world of enterprise Wi-Fi and the manta is “only use 20MHz wide channels” yet this is not the default for most vendors, and then you might notice pretty much every ISP router supplied to domestic customers (at least in the UK) is using 80MHz channels…. so what gives and when are these big bloated wide channels a good idea?

Perhaps the first thing to understand is what this even means. We’re talking about the 5GHz band ranging from 5150MHz to 5850MHz. For Wi-Fi this is divided up into 20MHz channels, although not all of this spectrum is available in all countries. In the UK most enterprise Wi-Fi vendors offer 24 channels for indoor use. A 40MHz channel is simply two neighbouring 20MHz channels taped together. (more information can be found in Nigel Bowden’s whitepaper)

Wi-Fi speed depends on a lot of variables but chiefly it comes down to the Modulation & Coding Scheme (MCS), the number of spatial streams supported by the client and Access Point (two spatial streams is twice the speed of one, for example) and the channel width being used. A 40MHz channel has double the throughput capacity of a 20MHz channel (actually it’s ever so slightly more than double, but let’s keep it simple) and 80MHz can double that again.

Back to ISPs. BT currently recommend I take up their full fibre service offering 150Mbps download speed. I’m going to expect to see that when I run a speedtest from my iPhone. So what does that mean for the Wi-Fi?

The first thing to identify is that my client, the iPhone XS Max, supports Wi-Fi5 (802.11ac) with one spatial stream. So if we take a look at the MCS table (we’re interested in the VHT column) the fastest speed we can achieve is 86.7Mbps for a 20MHz channel. Importantly this is the raw link speed, various overheads mean you’re not going to see that from your speedtest application. What’s more this is the best we can do in ideal circumstances. If my Wi-Fi router is a room or two away it’s unlikely the link will reliably achieve that MCS Index of 8.

So why does the BT router use 80MHz channels when it looks like a 40MHz channel should let us reach our 150Mbps line rate?

Two reasons. Firstly BT sell services with a faster line rate of around 500Mbps and remember these highest speeds are in optimum conditions. So by using an 80MHz channel, we’ve got up to 433.3Mbps of Wi-Fi capacity for our single stream client which increases the chances of hitting a real world 150Mbps throughput around the house.

“So what?” you may ask. Well you don’t get something for nothing, there’s always a trade-off. Remember Wi-Fi only has a finite amount of channel capacity and we need to be deliberate in how that’s used.

For enterprise networks we’re typically less concerned with the maximum throughput a client can achieve versus the aggregate throughput of the whole network. Basically, it’s not about you it’s about us.

Creating good coverage for an office space means multiple access points. We ideally want each of those access points to be on a separate channel or at least to have APs on the same channel to be as far apart as possible. Because using wider channels limits how many you can have, we can reduce the effectiveness of channel reuse in larger networks. That means increased risk of interference between APs, resulting in collisions, lower SNR and ultimately lowering throughput.

This is why a large, busy network running on 80MHz channels can be expected to have lower aggregate throughput than with 40 or 20MHz channels.

There’s also the important matter of noise.

Noise is signal on our channel, picked up by the receiver, that isn’t useful signal we can decode. The key to achieving a high MCS value is a high Signal to Noise Ratio (SNR). For each doubling of the channel width (from 20, to 40, to 80) the noise level is doubled too.

Back to ISPs… again. My hypothetical BT router is running on the same UNI-1 80MHz channel as my neighbours either side. Which means there’s a very high chance of interference. So although BT have chosen this bloater of a channel to improve throughput, it could do the opposite. In most cases you get away with it because our houses provide sufficient attenuation, especially at 5GHz. But densely populated areas, flats for example, it can be the case that neighbouring Wi-Fi networks are really very strong.

Which, finally, brings me to where you can successfully use these wide channels: anywhere you’re not competing for channel space.

So for a small network install in an area that doesn’t have neighbouring networks it can work really well. I’ve tested using 80MHz channels with my home network, simply because I can. The house and my home office have foil backed insulation which does a good job of blocking Wi-Fi. What’s more the ISP supplied routers all tend to use UNii-1 channels – the first four of the band. I’m using Aruba enterprise APs so can select other channels that nobody is using nearby.

And so we reach some sort of conclusion which is, yes, 20MHz channels are still the right way to go for most enterprise deployments. You can use wider channels if you know you have the capacity for them and you’re not ruining your channel re-use plans. At home, if you’re not getting anywhere near the throughput you think you should, you might be suffering from everyone effectively using the same channel. But don’t forget to test with a few different devices, your phone is probably the worse case scenario.

Incrementing a ClearPass Endpoint attribute

This post is based on the SQL query found here. It’s clearly a fairly niche requirement but it’s come in very handy.

Let me set the scene… An open Wi-Fi network is provided for a major public event but is only there for accredited users, not the general public, authenticated with a captive portal. Some members of the public will no doubt try connecting to the network, reach the CP and find they can’t get anywhere.

The problem is every connection consumes resources. If enough people do this there could be DHCP exhaustion and issues with the association table filling up. Assuming everything is sized appropriately it’s likely to be the number of associations that are the primary concern.

There are various answers this configuration, most of which revolve around better security in the first place, but there are good reasons it’s done this way… let’s move on.

MAC caching is being used by ClearPass so the logic says: “do I know this client and if so is the associated user account valid? If so return the happy user role, if not return the portal role”.

This means there are no auth failures – we’re never sending a reject, ClearPass returns the appropriate role.

What we want to do is identify clients that don’t go through captive portal authentication, and therefore just keep being given the portal role.

I added an Endpoint attribute of “Counter” (Administration\Dictionary Attributes)

Next a custom filter is added to the Endpoints Repository. This query (courtesy of the wonderful Herman Robers) reads the counter attribute into a variable of “Counter”. It also reads the counter attribute and adds 1 for the variable “Counter1”.

SELECT attributes->>'Counter' as Counter, (attributes->>'Counter')::int +1 as Counter1 FROM tips_endpoints WHERE mac_address = LOWER('%{Connection:Client-Mac-Address-NoDelim}')

Add this as an attributes filter under Authentication\Sources\Endpoints Repository

Within a Dot1X or MAC-auth service you can then call the variable: %{Authorization:[Endpoints Repository]:Counter1}

An enforcement profile is created to update the endpoint with the contents of Counter1 and this is applied alongside the portal role.

The result is each time a client hits the portal we also increment the counter number. At a threshold, to be determined by the environment, we start sending a deny. In my testing I set this to something like 5 to prove it worked.

This works well but a persistent client can just keep trying to connect which can still consume some resource as the AP has to generate auth traffic.

In this case the network is using Aruba Instant APs which have a dynamic denylist function. This was set to block clients for one hour after 2 authentication failures.

What happens now is after a client has hit the portal 5 times, ClearPass sends a reject, the client almost immediately tries again and is rejected, at which point it’s added to the denylist and can no longer associate with the network.

There are risks to this approach – it’s easy to see you could end up with false positives being denylisted. Clearly a better overall solution would be to avoid deploying an open network but that opens a whole other can of worms when dealing with a very large number of BYOD users.