The purpose of any post mortem is to look into the past in order to find ways to prevent similar issues from happening again, and also to improve upon our responses to issues found in the future. It is not to blame others, point fingers, or punish. A proper post mortem states facts, including what went well and what did not, and issues ideas for improvements going forward.
Short rehash: log4j is a popular java library used for application logging. On November 26, 2021, a vulnerability was discovered in it that allowed any attacker to paste a short string of characters into the address bar, and if vulnerable, the attacker would gain remote code execution (RCE) access to the web server. No authentication to the system was required, making this the simplest attack of all time to gain RCE (the highest possible level of privilege) on a victim’s system.
Points of interest and timeline:
- This vulnerability was recorded in the CVE database on Friday, November 26, 2021, but was not weaponized until December 9, 2021.
- This vulnerability is often referred to as #Log4Shell.
- Log4j can only be found in software using java and/or jar files,
- Non-java languages and frameworks were not affected: log4Net, log4Js, etc.
- Both custom software (made in house) and COTS (configurable off the shelf) software was affected, including popular platforms that are household names.
- Many pieces of software call other pieces of software, including log4j, and thus were vulnerable, despite not seeming to be a problem at first glance .
- Several servers with middleware (due to the library running inside them) were vulnerable to log4j.
- The CVE is named CVE-2021-44228
- Log4j versions Log4j 2.0 – 2.14.x were vulnerable
- A patch, Log4j 2.15 was released on December 16th, and almost immediately a vulnerability (CVE 2021-45046) was found in it that allowed denial of service attacks.
- Another patch, Log4j 2.16, which was also deemed vulnerable
- December 17th, a 3rd vulnerability in log4j was released, CVE-2021-45105
- On December 28th Checkmarx released another vulnerability (CVE-2021-44832) within log4j, but it required that the user already have control over the configuration, meaning the system had already been breached, meaning it was not nearly as serious as previously reported vulnerabilities.
- Version 2.17.2 is widely considered the safest version of this library.
- Log4J 1.x was no longer supported as of 2015, and while all 1.x versions have several vulnerabilities, all were immune to this one exploit.
As soon as the alert came out, our industry acted. Incident responders immediately started contacting vendors, monitoring networks, researching, and contacting peers for more ideas. Application security teams worked with software developers to find out if any of their custom code had the affected libraries. CISOs started issuing statements to the media. And the entire industry waited for a patch to be released.
What was the root cause of this situation?
A very large percentage of all the applications in the world contain at least some open-source components, which generally have little-to-no budget for security activities, including security testing and code review. On top of this, even for-profit organizations that create software often have anywhere from acceptable to abysmal security assurance processes for the products and components they release. The part of our industry that is responsible for the security of software, often known as application security, is failing.
- Little-to-no financial support for open-source software means there is usually no budget for security.
- Due to not enough qualified people in the field of application security, it is extraordinarily expensive to engage a skilled expert to do this work.
- No regulation or laws controlling or addressing security in IT systems in most countries means this industry runs without governmental influence or regulation.
- Although there are some groups (such as NIST and OWASP), trying to create helpful frameworks for software creators to work within, there is no mandate for any person or organization to do so.
- The security of software is not taught in most colleges, universities, or boot camps, meaning we are graduating new software engineers who do not know how to create secure applications, test applications for security, or recognize and correct many of the security issues they may encounter.
- Education for software security is extremely expensive in the Westernized world, pricing it out of reach for most software developers and even organizations.
Due to companies not sharing information, it is impossible to state specifics in this category. That said, after speaking to a few sources who wish to remain anonymous, the following is likely true:
- Damages are estimated in the hundreds of millions, for the industry world-wide.
- Hundreds of thousands of hours of logged overtime, most likely resulting in or contributing to incident responder employee burnout.
- Many organizations only applied this one patch and went back to business as usual. That said, some used this situation as an opportunity to create projects to simplify the patching process and/or the software release process, to ensure faster reaction times in the future.
- Many companies that previously did not think supply chain security was important have updated their views, and hopefully also their toolset and processes.
- When unable to create an accurate cost estimate, ‘guesstimates’ are often accepted.
Time to Detection?
- Most companies (according to anonymous sources and online discussion) spent 2-3 straight weeks working on this issue, dropping all other priorities for the InfoSec teams and most other priorities for those applying patches and scanning.
- Detection in 3rd party applications and SaaS was extremely difficult, as many organizations issued statements that they were unaffected, only to find out later they had been incorrect/uninformed.
- Generally, most incident response teams responded the day-of the announcement.
- AppSec teams checked their SCA tools and code repositories for the offending library and asked for patches/upgrades where necessary.
- CDNs, WAFs and RASPs were updated with blocking rules.
- Those managing servers searched dependencies and patched, feverishly.
- Those managing operating systems, middleware and SaaS wrote vendors to ask for status reports.
- Incident responders managed all activities, often leading the search efforts.
Lessons Learned? Opportunities for Improvement?
What follows are the author’s ideals for lessons learned. Each organization is different, but below is a list of potential lessons learned by any organization.
- Patching processes for operating systems, middleware, configurable off the shelf software (COTS), and custom software must be improved. This is the main threat to organizations from this type of vulnerability, slow updates/upgrades/patches that leave organizations open to compromise for extended periods of time.
- Incomplete inventory of software assets is a large threat to any business, we cannot protect what we do not know we have. This includes a software bill of materials (SBOM). Software asset inventory must be prioritized.
- Organizations that learned later, rather than earlier, about this vulnerability were at a distinct disadvantage. Subscribing to various threat intelligence and bug alert feeds is mandatory for any large enterprise.
- Many Incident Response teams and processes did not have caveats for software vulnerabilities. Updating incident response processes and team training to include this type of threat is mandatory for medium to large organizations.
- Most service level agreements (SLAs) did not cover such a situation and updating these with current vendors would be a ‘nice to have’ for future, similar, situations. Adding this to vendor questions in the future would be an excellent negotiation point.
- Many custom software departments were unprepared to find which applications did and did not contain this library. Besides creating SBOMS and inventory, deploying a software composition analysis tool to monitor custom applications and their dependencies would have simplified this situation for any dev department.
- Many organizations with extensive technical debt found themselves in a situation where it would require a re-architecting of their application(s) in order to upgrade off of the offending library. Addressing deep technical debt is paramount in building the ability to respond to dependency-related vulnerability of this magnitude.
- There are hundreds of thousands of open-source libraries all over the internet, littered with both known and unknown vulnerabilities. This problem is not new, but this specific situation has brought this issue into the public eye in a way that previous vulnerabilities have not. Our industry and/or governments must do something to ensure the safety and security of these open-source components that software creators around the world use every day. The current system is not safe nor reliable.
What went well?
- Incident response teams worked quickly and diligently to triage, respond to, and eradicate this issue.
- Operational and software teams responsible for patching and upgrading systems performed heroically, in many organizations.
- Multiple vendors went above and beyond in assisting their customers and the public responded quickly and completely to this issue.
What could have gone better?
- Messaging was confused at times, as few knew the extent of this issue at first.
- The media released many articles that emphasized fear, uncertainty, and doubt (FUD), rather than helpful facts, creating panic when it was not necessary.
- Companies that produce customer software, but who did not have application security resources, were left at a distinct disadvantage, unaware of what to do for the first few days (before articles with explicit instructions were available).
- Many vendors issued statements that were just not true. “Our product could have stopped this” and “you would have known before everyone else if you had just bought us”, etc. Although there are some products that may have been able to block such an attack without additional configurations, they were few and far between compared to the number of vendors claiming this to be true of their own product(s).
- Improve infrastructure, middleware, and COTS patching and testing processes.
- Improve custom software release processes.
- Request Software Bill of Materials (SBOMs) for all software, including purchased products and those which are home-grown.
- Create a software asset inventory and create processes to ensure it continues to be up to date. This should include SBOM information.
- Subscribe to threat feeds and bug alerts for all products you own, as well as programming languages and frameworks used internally.
- Train your incident response team and/or AppSec team to respond to software-related security incidents.
- For companies that build custom software: Install a software composition analysis tool and connect it to 100% of your code repos. Take the feedback from this tool seriously and update your dependencies accordingly.
- Negotiate SBOMs and Service Level Agreements (SLAs) on patching for all new COTS and middleware products your organization purchases. Attempt to negotiate these after the fact for contracts you already have in place.
- Do your best to keep your dependencies reasonably up to date and address technical debt in a reasonable way. If your entire application needs to be re-architected just to update a single dependency to current, this means your technical debt is currently unacceptable. Make time for maintenance now, rather than waiting for it to “make time for you”, later.
- Create a process for evaluating and approving of all dependencies used in custom software, not just open-source ones. A software composition analysis tool can help with both an implementation and documentation point of view.
Could we have known about this sooner?
The very frustrating question the incident responders have been asked over and over again since this happened, is could we have known about this sooner? And the answer, unfortunately, is probably not. Not with the way we, as an industry, treat open-source software.
Could we have responded better?
This is a question only your organization can answer for itself. That said, reviewing the ‘Lessons Learned’ section and implementing one or more of the ‘Action Items’ in this article could certainly help.
How can we stop this from happening again?
Our industry needs to change the way we manage open-source libraries and other 3d party components. This is not something the author can answer, as a single person. This is something the industry must push for to implement real and lasting change. One person is not enough.
It is likely that much of our industry will remain unchanged from this major security incident. That said, it is the author’s hope that some organizations and individuals changed for the better, prioritizing fast and effective patching and upgrading processes, and the repayment of technical debt, setting themselves apart from others as leaders in this field.