If you're one of the millions of Australians who tried in vain to fill in the census on the ABS site earlier this month, you don't need to be told about the decidedly shocking way the project was managed or be reminded of the subsequent social media blowout. Despite reassurances by the ABS that the matter has been resolved, it's now more than a week since census night and less than half of the population has been able to complete it.
According to the ABS, the reason for the outages were 'malicious' Denial of Service (DoS) attacks. And to make matters worse, the Australian government seems to be more concerned with shifting the blame to IBM, their technology partner, rather than solving the problem. But when you take the blame game out of the equation, there are two questions that need answering: how did a project of this magnitude fail so spectacularly and what should have been done differently?
Our take on the blame game: what really happened?
According to Patrick Gray of the Risky Business podcast, upstream provider NextGen Networks offered DoS protection services that were rejected by both IBM and the ABS, as they claimed they would not need it. Instead, they requested that NextGen geoblock any traffic originating from outside Australia in the event of an attack. When a small DoS attack was detected, the geoblocking was triggered, but a second attack originating from within Australia was detected shortly afterwards.
According to Patrick Gray of the Risky Business podcast, this attack was 'a straight-up DNS reflection attack with a bit of ICMP thrown in for good measure', and filled up their firewalls' state tables. In response, the ABS rebooted their firewall, which was operating in a pair. But because the ruleset hadn't been synced before reboot, the secondary firewall was effectively inactive, leading to another short outage.
A short while later, there was an alert from IBM's monitoring equipment that was incorrectly interpreted as data exfiltration. With their nerves already shaken up from the DoS attacks, the monitoring team became convinced that they had been hacked and that the DoS attacks were nothing more than a ploy by the data thieves to distract them from the exfiltration. The end result? The plug was pulled (literally) and the ASD was called in to clean up the mess.
Even though the suspected exfiltration was confirmed as a false positive, the ASD needed to follow incident response procedure before the website could go live again.
What lessons can we take from the way the #CensusFail was handled?
1. Plan ahead
In today's cyber security climate, being a large organisation means it's inevitable that you'll be hacked at one point or another. It's no good waiting until your data has been compromised before you implement the necessary data security policies - these should be built into your system from the word go.
Implementing DoS protection for the census site wouldn't have been prohibitively difficult or costly - all it would have taken was a little proactivity. As JFK put it, "The time to repair the roof is when the sun is shining."
2. Create interdepartmental Standard and Emergency Operating Procedures
Part of the reason that the census situation spiralled out of control so dramatically was a lack of communication between departments.
Even if the issue exists in a single segment, the evidence for the problem might be across several points. This can lead to several people working on the same problem unknowingly or issues being misinterpreted and incorrectly classified. This isn't just a waste of resources - at worst, it could mean that your entire admin team is working on a false lead while real damage is being done elsewhere.
3. Have enough bandwidth to be able to cope
DoS attacks bring down sites by blocking existing bandwidth.
It's vital to be proactive about bandwidth management, and it's worth considering temporarily increasing bandwidth for periods of high demand - an entire country's worth of people accessing your site to fill in a census form, for example - and implementing good Quality of Service (QoS) policies. By being smart with your bandwidth, you're able to build out a buffer so that your site doesn't come crashing down at the first sign of a DoS attack.
4. Don't just rely on your network monitoring software
No matter how powerful and responsive your NMS may be, it's of little use without an equally skilled team of network admins. At best, an NMS can bring your attention to issues within your environment - you need a team of human beings to analyse, interpret and act on these issues. Knowledge of the network should be shared among all admins so that any issue can be dealt with quickly and effectively.
The #CensusFail is a perfect case in point here - had the monitoring team been better prepared and not jumped to conclusions, the plug may have never been pulled and the census might have gone ahead with far less disruption.
5. Test yourself both locally and over the Internet
If you're providing a service to customers over the internet, testing internally will only give you part of the picture. For testing data to be truly valuable, you need to test your availability from your customer's point of view by simulating their activity. With this information, you can understand exactly how your customers are experiencing your service, which results in a far greater appreciation of your users' needs in the long run.
Had the ABS and IBM been more thorough with their testing, they could easily have identified the suspected exfiltration as a false positive and focused their efforts on minimising the damage of the DoS attacks.
If you'd like to know more about partnering with the right technology partner, ensuring that your environment is safe from cybercrime, and having the appropriate perimeter and data security, talk to us and check out PowerCONTROL.