A gentle introduction to static code analysis
A gentle introduction to static code analysis
I still remember the days when writing tests wasn’t a common practice. Untested code used to work in production, until something went wrong. Eventually, the idea that code should be tested to ensure that changes didn’t break anything became widespread.
Interestingly, testing didn’t catch on simultaneously in all languages and platforms, but spread gradually. Tool makers transferred good practices from one area to another.
I see something similar happening with static code analysis. In some areas, it’s more common and even required by law, while in others it’s not yet fully adopted. However, surveys and statistics show that about half of developers use static analysis, and this number is growing. In 2021 it was about 40%. I believe this trend will continue, and eventually static analysis will become as commonplace as writing tests.
What is static code analysis?
Static code analysis is the process of examining source code (without actually executing it) to identify potential defects, security vulnerabilities, and other quality issues. Static analysis can help you improve the quality and reliability of software by detecting issues early in the development cycle, which can lead to cost savings and reduced time-to-market.
While static code analysis offers extensive insights into code, it is also worth observing that complexity and cost can be barriers to its adoption. Often, alternative methods such as testing or direct program execution are more practical, and strike a different balance between effectiveness and complexity.
There are several alternatives to static code analysis including dynamic analysis and manual code review. Dynamic analysis involves running the code and observing its behavior to identify issues. Dynamic analysis can be effective in detecting issues related to performance and security, but requires a running application and can be time-consuming.
Manual code review involves having humans examine the code to identify issues. This too can be effective, but also can be time-consuming, error-prone, and subjective.
Static code analysis capabilities
Static code analysis is carried out using automated tools that apply a set of rules and algorithms to detect problems in a codebase. It can be applied to a number of distinct programming areas and objectives. We will review those briefly in the next paragraphs.
Code style and formatting
Enforcing a set of conventions on code format helps improve code readability and consistency across a project. Code style is usually enforced by integrated code quality systems, such as SonarQube, JetBrains Qodana, GitLab Code Quality, Codacy. If an organization has adopted a code quality system with no support for code style checks, developers can choose dedicated tools specific to their programming language.
For example, ESLint and TSLint are extremely popular in the JavaScript ecosystem. Prettier is known for its wide language support, customization options, and ease of use. StyleCop covers the needs of .NET developers.
Probable bugs and data flow analysis
Static analysis helps prevent issues such as null pointer dereference, divide-by-zero errors, infinite loops, unused branches in logical expressions, errors in regular expressions, suboptimal code, resource leaks, and so on.
As with the previous category, those issues can be caught either by general purpose tools (SonarQube, Qodana, GitLab Code Quality, Codacy) or by dedicated, language-specific tools. For example, FindBugs and PMD are popular for Java, and Roslyn Analyzers for .NET.
Code duplication detection
Most general purpose analyzers are able to detect code duplication, but there are also dedicated tools for this purpose. This is an often overlooked area, but it is an important part of code maintenance.
Third-party license audit
The primary goal of third-party license audit tools is to automatically detect and identify the licenses of the third-party components used in your project.
These tools often analyze package metadata, license files, and even source code comments to determine the applicable licenses. Also, often they provide license inventory to ensure compliance with legal obligations and company policies. The report produced by such tools can be shared with stakeholders and used for decision-making and compliance documentation.
Popular tools for third-party license audits include FOSSA, WhiteSource, Black Duck, and Snyk, which offer comprehensive features and capabilities. However, if you are looking for a simpler and more lightweight solution to get started, there are alternative options. For example, many package managers offer built-in commands or plugins that can generate a list of dependencies and their associated licenses.
Security
Security is a huge topic, spanning hundreds of types of coding issues that should be prevented. Those can be divided into two major groups—source code security and build chain security.
Security vulnerabilities, weaknesses, and flaws within the source code can expose applications to SQL injection, cross-site scripting (XSS), buffer overflows, and other types of attacks. Weaknesses in the build chain and dependency security can lead to dependency confusion or supply chain attacks.
Build chain attacks compromise the integrity of a software system by injecting malicious code or exploiting vulnerabilities in third-party components. To mitigate these risks, developers should regularly audit their dependencies, ensure that they use trusted sources for libraries and frameworks, and implement robust access control and monitoring throughout the software development lifecycle.
Gartner’s Magic Quadrant for SAST (static application security testing) identifies Synopsys and Checkmarx as leaders in this category, but there are also many smaller players. Decisions regarding which tools to use always come down to risks, budget, goals, and circumstances.
Static code analysis tools
Static analysis tools can generally be divided into two main categories: those that developers run on their local machines, and those that are integrated into the development pipeline.
The first group includes tools that are typically integrated into modern IDEs, as well as standalone linters that can be run locally. These tools are designed to help developers catch issues early in the development process. They provide instant feedback, which is ideal, but they can’t catch complex issues. Also, it is hard to ensure that everyone on the team uses them.
The second group is much broader and includes a variety of tools that are integrated into the development pipeline at the server level. These tools range from simple linters that are executed remotely as part of the build chain, to more complex solutions that are installed separately and dedicated to multi-layer analysis (i.e., different types of analysis that are performed at different stages of the development cycle).
Also, whereas the first group of tools targets developers exclusively, the second group targets a broader audience, which can vary from developers to team managers, from security teams to devops, and so on. Both categories are indeed important and the integration between all tools plays a crucial role.
Static analysis tools can also be categorized based on several factors, which we discuss below.
Programming language support
Different static analysis tools support different programming languages. Some tools are designed for a specific language, such as Pylint for Python or ESLint for JavaScript, while others, like SonarQube, support multiple languages.
Analysis techniques
Static analysis tools may employ different techniques to analyze code, such as pattern matching, data flow analysis, control flow analysis, or abstract interpretation. These techniques can vary in complexity and effectiveness, affecting the tool’s ability to detect issues.
Rules and coding standards
Each static analysis tool comes with a set of rules or coding standards that it checks, which can differ significantly across tools. Some tools focus on specific coding standards like MISRA for C/C++ or PSR for PHP, while others offer more general checks.
Customizability
Some static analysis tools allow users to customize the analysis by adding or modifying rules, enabling the tool to focus on specific concerns or adhere to organization-specific coding standards. These extensions could be as simple as adding your own regular expression to check for certain cases, but they range all the way up to full-scale plugins with complex functionality.
Integration and automation
Tools can vary as to ease of integration with development environments, build systems, and continuous integration pipelines. Some tools offer plugins or APIs to facilitate integration, while others require manual configuration.
User interface and reporting
How a tool presents its findings has a direct effect on its usability. Some tools provide user-friendly, web-based interfaces, while others generate reports in various formats like XML, JSON, or HTML. The level of detail and filtering and sorting options can also differ between tools.
Performance and scalability
How fast a static analysis tool can analyze code and its ability to handle large codebases can impact its suitability for different projects.
Licensing and cost
Static analysis tools can be open-source, free, or commercial, with varying levels of support and features. Open-source tools like Pylint or ESLint are free to use, while commercial tools like Coverity or Klocwork often provide more advanced features, support, and updates at a cost. Many vendors including SonarSource and JetBrains offer both a free product and a more sophisticated paid solution at the same time.
Benefits of static code analysis
Besides code quality improvement, static analysis brings a few other valuable benefits.
Planning ahead
One of the most valuable aspects of static analysis, but which is often overlooked, is the ability to plan ahead. Rather than simply fixing issues that already exist, developers can use static analysis to estimate the amount of work required before switching to a new library, language version, or framework. By integrating an issue tracker, teams can easily split those issues among members and track progress over time.
In addition, static analysis tools can help developers evaluate code that they may know little about. This is particularly useful when working with third-party or subcontracted code. With static analysis, developers can quickly evaluate the quality and security of the code, identify any potential issues, and take steps to mitigate risks.
GDPR compliance
The next business benefit worth mentioning is GDPR compliance. While many people know that static analysis can help ensure code compliance with certain regulations, GDPR is not the first to come to mind.
For example, you can create checks that prevent developers from writing personal user data into application logs in a way that would be incompatible with the regulation. This could save a lot of time and effort later when dealing with authorities.
Team collaboration
A third important aspect of static analysis is its team-oriented nature. When used on a server, static analysis can help ensure everyone follows the same coding standards and best practices. It also spreads knowledge, facilitates code reviews, and minimizes manual work. I expect to see more team applications of static analysis in the future.
How to choose static analysis tools
When a company decides to select a static analysis tool or tools, the journey should begin by considering factors such as goals, budget, time frame, target audience, the willingness to modify existing workflows, and the overall operational landscape.
Keep in mind that even the perfect tool will not be effective if the people involved are not willing to invest their effort.
In industries like automotive or medical, where regulations often mandate the use of specific coding standards and static analysis tools, the choice of tools is driven by compliance requirements. In these regulated areas you’ll find tools such as Polyspace, Coverity, and Parasoft and industry standards such as MISRA C, MISRA C++, ISO 26262 (Automotive), DO-178C (Aerospace), and IEC 61508 (Functional Safety).
In more flexible situations, success depends more on the organizational culture than on the tools themselves. However, the right tools can certainly facilitate the process and make it more manageable.
Here are some factors to evaluate before choosing a static analysis tool:
- Goals and objectives: Identify the primary goals of using a static analysis tool and how you will measure it.
- Language support: A multi-language tool provides consistency, streamlines the learning curve, and unifies reporting and tracking.
- Integration with your ecosystem: The ability to integrate with your IDE and continuous integration server are crucial.
- Customizability: It is also critically important that the tool allows customization of rules and profiles to suit your specific needs.
- Cost and licensing: Consider the trade-offs between open-source, free, and commercial tools, taking into account features, support, and updates provided by each of them.
- Performance and scalability: Assess the tool’s ability to handle large codebases and analyze code quickly without impacting development time.
- Reporting and user interface: Examine the tool’s reporting capabilities, user interface, and ease of use.