More RadSec fun

A previous post waffled about setting up Radsec between an Aruba AP and ClearPass running in Azure. Having deployed this in a slightly different Azure environment I’ve learned some things worth sharing.

The first thing is ClearPass handles RadSec using RadSec Proxy. This receives the RadSec connection and proxies the RADIUS traffic to the ClearPass RADIUS server. One casualty of this approach is that, at the time of writing, Policy Manager sees these incoming connections as being from localhost.

There are circumstances where it’s useful to know which ClearPass cluster member has dealt with a request. For example if I don’t have a cluster VIP, and that’s not supported in Azure, it’s possible to build HA for a guest portal if you know which server handled the initial MAC-auth. With RadSec you don’t, so this can’t be done.

My Azure lab was very simple, just a ClearPass VM running in an Azure tenant with no Network Security Groups configured. In the production environment I’ve worked with ClearPass was placed in a NSG protected by Azure Firewall which performed source NAT. As a result incoming RadSec connections all have a source of the NSG firewall private range, rather than the true source.

The RadSec tab of the Network Device config in ClearPass lets you set an override source IP. Previously I put the NAT public IP of my client in here. In production I’ve used the internal /26 network range of the azure firewall. The config box takes a range, in this case its 172.20.0.0-62.

I’ve then used the option to validate the certificate, using the CN or SAN and entered the CN name of the cert issued to that client.

The final point worth noting is neither ClearPass nor the Aruba AP issue keepalives down the TLS tunnel. The Azure firewall drops idle connections after four minutes. This is not configurable. So you will see lots of up and down messages in the event viewer. When I was initially testing this in my lab setup this was an indication of a problem. In production it was just normal behaviour, which caused some confusion.

In a busy network with lots of authentication traffic you’ll probably see the tunnel stay active for much longer. I’m not entirely convinced all is well, but it’s working.

IPSec over fortinet site to site vpn

Here’s a little gem found by a colleague.

A customer has Aruba AOS 8 Gateways communicating with the Mobility Conductor via a fortinet site to site VPN.

The Fortigate units were upgraded and the IPsec traffic from the Gateways to the Mobility Conductor was being dropped. Nobody noticed for a month until the Gateways stopped working because the licenses couldn’t be validated.

My colleague was able to prove the traffic was getting to the Fortigate and was passing through the rules but didn’t arrive at the other end of the tunnel. This smelled to me like an ASIC or acceleration issue, and that’s exactly what it was.

Lots of head scratching later… the issue was solved with npu-offload disable.

The new Fortigates had this acceleration function enabled by default and, it turns out, you can’t pass the Aruba IPSec traffic across the IPSec site to site VPN with npu offload enabled. It might be possible to make this work by changing the IPSec parameters, although I’m not even sure it that’s possible.

As it is, this customer doesn’t need to accelerated performance, they just wanted it to work.

Congrats to Colin Irwin who figured this out.

Fun and games with RadSec

Most of the time we’re using RADIUS over a private management network and don’t really care too much about security, but that changes when you put your authentication server in AWS or Azure and send authentication traffic over public internet. RADIUS is not secure and uses an MD5 hash for encryption that doesn’t meet modern standards. This is where RadSec comes in.

RadSec is effectively RADIUS over TLS with client and server certificates used to authenticate and encrypt the traffic. RadSec uses mutual authentication – at the setup of the tunnel both the client and the server need to successfully authenticate each other for everything to work. Worth noting RadSec only uses a single port – TCP:2083 – rather than separate ports for authentication, accounting and change of authorization (dynamic RADIUS).

To lab this I have ClearPass running in Azure. The authenticator is an AP on a private network, connecting to public internet via a NAT gateway and managed by Aruba Central.

For the server side this certificate authentication is probably familiar territory. You add a server certificate, signed by a CA the client will recognise, and that’s about it. You may of course need to add the CA root to your client’s certificate store. Here’s my RadSec certificate in ClearPass, signed by a private CA.

The network access device entry has the actual private IP address of the AP so I can identify it more easily but ClearPass will see an incoming connection from the NAT public IP. I’ve entered this into override IP under RadSec settings so ClearPass will accept the connection. I can also add certificate validation checks. This lets you separate multiple authenticators behind a single NAT IP. For example if you have both switches and APs on a site and need to handle these differently, placing them in different NAD groups or applying varying attributes. In this case I’m not performing any authorization checks.

Because it’s a private CA I’ve uploaded the root certificate that signed the server certificate (labca_root) into Central and applied this on the group security settings. The CA root is needed for the APs to authenticate the server. I’ve also uploaded a client certificate (radsec_client_san) that the APs use when contacting the server. The same CA was used for this so the root certificate has also been added to the ClearPass certificate trust list.

The way Aruba Central handles things, these certificates are used by all devices in that group. So all my APs will use the same client certificate. Best practice with Aruba Instant is to proxy RADIUS traffic via the Virtual Controller. This changes in AOS10, but it’s likely a single certificate used to authenticate all devices within a group is just fine. You can assign certificates per device using template groups but you’d really need a very good reason to want to do that. Don’t make this more complicated than it needs to be.

That’s where most of the documentation ends. But it didn’t work.

The one place I got stuck for an embarrassingly long time was getting the client certificate to work. I’m no expert on working with OpenSSL but to save you my pain, here’s what I eventually realised.

All certs, server and client, should have a subject alternate name with the IP address. This means things still work if the client can’t resolve the DNS name of the server, and it’s essential for the client certificate to have the IP address from which the client will connect included, so in this case it’s the NAT public IP. You can include multiple IPs for a certificate that’s being used across various sites. I don’t believe there’s a viable solution for not having static addressing. Dynamic DNS won’t help you here.

The client certificate needs to be just that, a client cert. Specifically it needs the X509v3 extended key usage variable of: TLS Web Server Authentication. In this output from OpenSSL you can also see the Subject Alternative Name appears where, trust me, the IPs are listed.

If you don’t have either the IP address your authenticator is connecting from in the certificate, either as the CN or in the SAN, or the TLS Web Server Authentication extension it won’t work and you’ll get unhelpful errors on the AP such as:

radsec_read_from_tls_socket:311: some error(6) encountered while reading

If, like me, you’re using ClearPass and, unlike me, have an Onboard CA you can generate the client certificate there. For my purposes I’m using OpenSSL to do this. There lots of tutorials in how to use it, but here’s what I did.

Create a config file with the necessary settings, which I saved as san.cnf

[req]
distinguished_name = req_distinguished_name
req_extensions = v3_req
prompt = no
[req_distinguished_name]
C = GB
ST = The Shire
L = York
O = Homelab Inc
OU = IT
CN = aps.homelab.home
[v3_req]
extendedKeyUsage = serverAuth, clientAuth, codeSigning, emailProtection
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
[alt_names]
DNS.1 = <ip1>
DNS.2 = <ip2>

Generating the CSR: openssl req -nodes -newkey rsa:2048 -keyout radsec_client.key -out radsec_client.csr -config san.cnf

Making the certificate – for this I’d already created my private CA root, which is what’s referenced as the root.pem and root.key: openssl x509 -req -in radsec_client.csr -CA root.pem -CAkey root.key -CAcreateserial -out radsec_client.crt -days 7200 -sha256 -extensions v3_req -extfile san.cnf

In order to import the client cert it often needs to be in pkcs12 format: openssl pkcs12 -export -out radsec_client.pfx -inkey radsec_client.key -in radsec_client.crt

There’s no requirement for the client and server certificates to use the same CA, in this case it allows easy rolling of certs that are valid for a long time, which is beneficial for this use case. Using OpenSSL for a simple private CA is really, easy, here’s a good run through.

As I said before, getting all this to work took an embarrassingly long time. In my defence this is mainly because when things don’t work there’s very little useful information logged, hence this post which may be useful to someone, not least me when I need to do this again in a few years.

One final thought on this. Certificates get forgotten and expire causing merry hell with infrastructure when they do. In this deployment, using ClearPass and Central managed APs it’s trivially easy to update the certificate so if you have to use a CA that doesn’t let you create long expiry dates that’s probably not the end of the world. However be sure to set a diary reminder so that work can be scheduled. It’s quick and easy to do but it’s still potentially disruptive.

As certificate expiry terms get ever shorter, there remain areas where mutual authentication is valuable but changing things, even by automation, is undesirable. I’d still advocate using a nice long expiry date for certificates internal to infrastructure. I admit the 20 year cert life span of my lab CA is, perhaps, a bit extreme

The wall of RF doom

Here’s a video post about an interesting building construction that, probably, few will encounter so I thought it might be interesting. Also the perils of making assumptions about building construction when working on Wi-Fi designs.

My CWNE essays

Recently my friend and former 😢colleague Mac_wifi has called on others in the community to publish their CWNE application essays as a resource, benchmark, encouragement to others seeking to achieve the CWNE certification. I think this is a good idea. Nobody had asked me to write an essay in over 20 years and I wasn’t quite sure where to start. Hopefully a few of the current crop of CWNEs sharing their submission will help guide others.

Here’s a link to that blog post

I agree with the CWNP’s approach to their expert level. If you’re used to sitting multiple choice exams through Pearson Vue for your IT certs the need for an essay might seem a bit strange. Indeed many certification streams award the expert level automatically once enough exams have been passed. There’s nothing wrong with that approach but CWNP chose to take a different route for a good reason.

To become a CWNE, in addition to passing the exams, you need at least five years experience working in enterprise wireless networking, have people prepared to endorse you and write three essays to demonstrate that knowledge and experience.

What that means is a CWNE knows their stuff, has walked the walk and … well you know how the rest of that goes. It means a CWNE has experience that goes beyond the theoretical knowledge one can gather by studying for certification.

Some advice I’d offer anyone aiming for CWNE certification is to gather real world examples of solutions delivered and problems solved for your essays. In the rushing around of daily life it’s easy to fix a problem, move on to the next thing and then forget about the details. It’s also worth remembering your essays don’t need to be a technical tour de force of excellence, proving your capabilities as a Wi-Fi super human. We all know the real world is full of compromises. Name them and explain them. I consider an essay that knowingly describes a flawed implementation speaks more of experience than a textbook “perfect” network design, for example.

So those essays…

Essay 1 – Design
Essay 2 – Security
Essay 3 – Analysis

I’m always happy to talk further with anyone about this, if you want to get in touch you can reach out to me on twitter

MOAR Channelz!!!

Something that happened in November 2020 but that slipped past me until today is the FCC opened up some of the UNII-4 band for unlicensed usage, which includes Wi-Fi.

It isn’t much, only 45MHz but because of where it sits, just beyond UNII-3, it’s really useful extra bandwidth.

UNII-3 is where channels 149-165 live with the band providing 5 x 20MHz channels, 2 x 40MHz and therefore 1 x 80MHz. Opening up just part of the UNII-4 band nicely rounds things off at the top of the existing 5GHz Wi-Fi spectrum allowing another 3 x 20MHz channels (169, 173 & 177), 2 x 40MHz, 1 x 80MHz and therefore making a new 160MHz channel possible.

This is, of course, excellent news. We love more channel capacity especially as it’s likely many chipsets will already support these channels so if we’re lucky extra capacity can be made available with a firmware upgrade.

To bring reality crashing in, it’s worth noting this is the FCC… I don’t believe there’s any word about what European and UK regulators are going to do. Hopefully they’ll follow suit, but we’ll have to wait and see.

Also, whilst extending bands is a great way to add capacity with existing equipment it raises the spectre of compatibility problems. The lovely enterprise Wi-Fi network you manage may well support these channels but if client devices do not then you can’t use them.

Getting certified

Recently I’ve been on a bit of a certification binge, gathering ACCP, ACMP, and ACSP from Aruba, CCSA from Check Point and I was very excited to achieve the expert level certification from CWNP. I am now CWNE #420 (yes I know….)

For a long time I considered a lot of IT certifications to be… a bit thin, maybe not worth the paper they’re written on…. that sort of thing. I thought I had good reasons for this but now I think I misunderstood what that certification meant. So what do I think now?

Anyone in the UK is probably familiar with the MOT. The annual road worthiness test cars over three years old are subjected to. A well serviced car should always sail through but just because your car passes the MOT, doesn’t mean it’s actually road worthy, or even safe.

What you’re getting is a snapshot of a moment in time when everything appeared to be ok. It doesn’t mean you can just ignore everything for another year. If the brakes don’t seem right, you’ll get ’em checked.

Before I fall too far down this analogy, what’s my point?

Most IT certifications are awarded after an exam that tests knowledge and understanding as best as possible. What’s very hard to test is an individual’s ability to apply that knowledge…. experience.

I have met people who were well qualified, with all the appropriate IT certifications who, despite this, just couldn’t fix the problem, design the thing, make it work. Just like the MOT, having the certification alone doesn’t guarantee anything.

Certification plus experience is a great combination. Quite a few certification paths suggest you should have a few years experience under your belt before you start. Someone who has been working in a field for a few years and has achieved the relevant certification shows much more than a newbie who’s crammed for the exam.

Certification is a great way to learn a product. You might have been working with Aruba ClearPass for several years, as I have, but maybe you’ve never touched Onguard. To pass the exam you need to learn about the whole product, even the bits you don’t use day to day.

Certification without experience still has value, but less value imho. Some people are just good at exams. They can cram. They can learn the correct answers. They can remember enough, for long enough, to get through. That doesn’t prove they can take that knowledge and apply it appropriately, but it might just give the edge if they’re up against someone with neither experience nor certs. At least it shows some drive and interest.

Just as the MOT certificate hopefully means the car is actually roadworthy, IT certification should tell you someone understands the principles and concepts of the product. If it turns out they don’t, you’ll probably discover that pretty quickly.

For many years I was looking after too many things to really get to grips with everything. I became very good at working out how to make something work, doing that, and then forgetting about it.

Personally I’ve found studying for certification exams to be a great way to focus the mind. To dig into products that previously I might have skimmed over, things I’ve never understood but could make work I now have a much deeper knowledge of. That’s important when they stop working…. or you need to justify to someone why you should be doing this thing and being paid well for it.

That’s now what I consider the IT certs I’ve achieved to be all about. They’re a framework for learning something focused on a technology or specific product and then validating that.

The further down a track you find interesting, the more rewarding it is to get the achievements.

As everyone who’s been granted CWNE status will tell you, it represents a lot of work – not just learning to pass exams but actively working in the field and gaining experience.

Quite apart from anything else the gentle ego massage of passing an exam doesn’t get old.

Image

Aruba UAP discovery

How a factory default Aruba Universal AP discovers just what it’s supposed to be in the world is a source of some confusion, not least because the process changed (I believe around AOS 8.4). For a thorough explanation of how things used to work see here.

Put simply all current UAPs (shipped with AOS8.6 or 8.7 at the time or writing) take a cloud first approach, which flips on its head what Aruba APs always used to do.

It’s perhaps worth explaining what a Universal AP actually is. Aruba UAPs can operate in Instant or Campus mode. A campus AP (CAP) talks to a Mobility Controller. An Instant AP (IAP) can be standalone or operate as part of an IAP cluster with one AP taking the role of Virtual Controller. It used to be each model of AP had separate CAP and IAP versions. An IAP could be converted to CAP mode but a CAP could only be a CAP. There was some weird geopolitical reason for this that eventually went away hence the more sensible Universal AP became the norm.

So what happens?

A factory default UAP follows this logic:
– Attempts to contact Aruba Activate
– Connect to Airwave if DHCP option/DNS entry exists
– join a virtual controller on the local subnet
– Listen for a virtual controller (layer 2)
– Look for a mobility controller (layer 2 ADP/DHCP options/DNS)
– Broadcast setmeup SSID
– Reboot after 15 minutes

Of course if the AP receives config at any of these steps, it does what it’s told and doesn’t proceed through the discovery logic.

What’s changed in this list from older software version (pre 8.4) is the cloud wins. If your UAP can connect Activate, and there’s a rule there telling it what to do, that’s what it does. If that rule is incorrect you could see some unexpected behaviour.

The other significant change is looking for a mobility controller is now pretty much bottom of the list. You might be used to an AP looking for a mobility controller as the first thing it does, which indeed is what used to happen, but not longer. So if you run a controller environment and manage to end up with a VC running on an AP subnet you’ll find other new APs form a IAP cluster with this and nothing appears on the mobility controller. Once this happens it’s possible to convert all the APs to campus mode, pointing them at a mobility controller, so this isn’t too painful.

This cloud first approach makes a lot of sense and when it all works smoothly Activate rules can make ZTP provisioning of a new UAP very smooth.

In many deployments it isn’t unusual to have no internet access for APs, especially campus APs. In this scenario, to avoid odd behaviour, make sure your AP subnet is configured with the correct DHCP options or DNS entry for the mobility controller.

If you find a new UAP doesn’t appear to be working, e.g. it hasn’t appeared on your mobility controllers, there’s every possibility that something else along the discovery logic has taken precedence.

Finally it’s worth noting you can manually set controller options from the AP-boot prompt by connecting to the AP console and interrupting the boot sequence. Any static configuration is acted on first. This approach doesn’t scale of course, but it’s useful for testing and building a small lab environment alongside an active system.

The clients you can’t control

You’ve just upgraded the network with the latest Wi-Fi6 APs – this promises to be faster, with lower latency and all round better for everyone and everything… great! But…. there are rumblings.

During your testing you found a number of the corporate laptops used an Intel Wi-Fi NIC and the driver had never been updated… these hit a well known bug that causes the driver to ignore Wi-Fi 6 enabled BSSIDs. No problem, because you did the testing that issue was found and a new driver was deployed.

Despite all your efforts, a number of helpdesk calls have come in from users complaining they can’t connect to the network any more. Some of them can’t even see the network…. Hmmm.

Turns out these machines haven’t had the new driver deployed by Group Policy because they’re not part of the domain. They’re BYOD, they have the same ancient driver and they won’t play ball with the 802.11ax network you’ve just deployed.

That’s not all. The old network didn’t have any of the roaming enhancements enabled and with all change it seemed the perfect opportunity to enable them: 802.11k/v/r all switched on.

Some of the misbehaving laptops can connect to the network, sometimes, but things are really unreliable. These also have an older Intel 7260 Wi-Fi chipset but updating the driver doesn’t help.

You’ve been struck by another Intel bug where the presence of the 802.11K Quiet information element upsets things and they break. This time it’s a hardware problem.

So do you switch off 802.11ax and 802.11k on any SSIDs used for BYOD or do you say “tough, your old stuff might not work any more”?

That, of course, is a policy matter.

When I encountered both these issues in a recent deployment, the decision was to take the path of least resistance and disable the functions. This means the network can’t benefit from the performance and capacity benefits offered by Wi-Fi6.

Not having any control over BYOD clients means they may end up dictating terms for the network. That might be fine, if it fits the policy, but in this scenario it was done because it was easiest, it made the problem go away. If that decision isn’t revisited later the network will always be operating below it’s potential.

AAA on the Wi-Fi

If you know any enterprise networking you’ll have come across AAA – Authentication, Authorization and Accounting. The cornerstone of network security that is ensuring the client can be authenticated, so not just anyone can connect – and often that first A is as far as it goes.

So what of Authorization?

This is what provides a way of making your Wi-Fi more efficient. If you have corporate devices, BYOD, IoT and they currently have three separate SSIDs (not uncommon) you can put all three onto the same SSID, reducing management traffic, and use the Authorization part of AAA to determine what network access each client should have.

This might use Active Directory group membership to determine which VLAN a user gets dropped into, or what ACLs are applied. In many systems both these are part of the Role Based Access Control is the term used for this.

Accounting is where you see what the user actually did. In practice this usually takes the form of when they connected to the network, how long for and how much data they transferred.

Here’s how it comes together in an example recent proof of concept for a customer:

Multiple departments are in a building and it’s necessary to provide security, keeping traffic from each dept separate. For regulatory purposes it’s necessary to assign network services costs to departments but this has to be based on real world information such as bandwidth use. Finally “we’re all one company so we don’t want to setup separate networks”.

Most networks have pretty much everything in place to do this, it’s just a question of whether all the dots are joined.

A RADIUS server (Aruba Clearpass in this case but something like Cisco ISE could be used) is already used for 802.1X authentication. Users from different AD groups can be assigned different roles, placing them in their department’s VLAN or simply applying ACLs specifying the access the client should have. The APs or controller are configured to return RADIUS accounting which allows the administrator to accurately determine the data traffic used for each connection.

Everything needed to do this has been around for quite a long time, but an awful lot of networks out there still have one SSID per VLAN, an SSID for each client type, and SSID for each day of the week. There are much better ways to do this.