Sym Piracha

Scanning code to make it safer

My first professional job was at a cybersecurity startup and it exposed me to a world that university courses had never touched: securing code and understanding vulnerabilities. As a student, security hadn’t crossed my mind, but I quickly realized it couldn’t be an afterthought. Security needed to be integrated early in the software development process, a concept known as “shifting left”. Instead of waiting until code reached production, where fixes are costly, security should be part of the software development lifecycle (SDLC). Catching vulnerabilities while writing or testing code is far less expensive and prevents potentially catastrophic issues from ever reaching production.

One of the first tools I encountered that embodied this philosophy was a Static Application Security Testing (SAST) tool.

What is SAST

A SAST tool analyzes source code, bytecode, or binaries without executing them. It scans your codebase for patterns that may indicate vulnerabilities. These can range from simple issues, like hardcoded credentials, to complex ones, such as SQL injection paths where untrusted input reaches sensitive queries.

Unlike dynamic testing, which observes runtime behavior, SAST inspects the code itself. This enables vulnerabilities to be detected as soon as code is written, long before deployment.

When integrated into a CI/CD pipeline, SAST provides near-immediate feedback. Developers can catch issues during commits or pull requests, reducing the cost and risk of vulnerabilities slipping into production.

How SAST Works

SAST tools generally follow four steps:


Parsing the Code

The first step is parsing the source code, bytecode, or binaries into one or more intermediate representations (IRs). At minimum, this usually includes an Abstract Syntax Tree (AST), which captures the code’s syntactic structure, such as expressions, statements, and blocks.

Many tools also build a Control Flow Graph (CFG) to represent possible execution paths. These representations provide the foundation for all subsequent security analysis.

Example

Consider this Java snippet:

String userInput = request.getParam("name");
String query = "SELECT * FROM users WHERE name = '" + userInput + "'";
database.execute(query);

The AST represents the structure:

Assignment
 ├─ Variable: userInput
 └─ Expression: FunctionCall(request.getParam, "name")

Assignment
 ├─ Variable: query
 └─ Expression: Concatenation
      ├─ "SELECT * FROM users WHERE name = '"
      └─ userInput

FunctionCall
 ├─ Function: database.execute
 └─ Argument: query

The CFG captures the flow of execution:

[Start] --> [userInput = request.getParam("name")]
           --> [query = "SELECT ... " + userInput]
           --> [database.execute(query)]
           --> [End]

Analyzing Data and Control Flow

Once the code is parsed into intermediate representations, the tool analyzes data propagation and execution paths.

  • Data flow analysis tracks variables and inputs across functions and modules.
  • Control flow analysis maps possible execution paths.

Why It Matters

String userInput = request.getParam("name");
String sanitized = sanitize(userInput); 
String query = "SELECT * FROM users WHERE name = '" + sanitized + "'";
database.execute(query);
  • The AST/CFG shows assignments and function calls.
  • Detecting SQL injection requires knowing whether userInput reaches query and whether it was sanitized.
  • Without data flow analysis, real vulnerabilities might be missed, or safe code could be flagged unnecessarily.

Data Flow Analysis Explained

Data flow analysis answers:

For each variable or value, where does it come from, and where can it go?

Security scanning tracks “tainted” data, input from untrusted sources, and determines if it reaches sensitive sinks such as:

  • Database queries
  • File system operations
  • System commands
  • Network output

How Analysis Works

  1. Identify sources and sinks: Mark where untrusted input enters (sources) and sensitive operations occur (sinks).
  2. Taint variables: Mark variables receiving untrusted input as tainted.
  3. Propagate taint: Track how tainted data moves through assignments, function calls, and expressions.
  4. Consider control flow: Analyze all execution paths, including branches, loops, and conditions.

Applying the Rules Engine

The rules engine decides whether a code pattern or data flow represents a vulnerability. It applies security rules and heuristics, often aligned with CWE or OWASP.

  • Syntactic rules detect simple patterns like hardcoded credentials or dangerous functions (eval()).
  • Semantic rules leverage data/control flow to detect complex issues, e.g., tainted input reaching a SQL query.

Example: Semgrep

Semgrep is a popular rules engine with lightweight data flow capabilities. You can write syntactic rules:

rules:
    - id: hardcoded-password
      patterns:
        - pattern: 'String $VAR = "$SECRET"'
      message: "Hardcoded secret found"
      severity: ERROR

Or track simple data flows, e.g., detecting SQL injection:

rules:
    - id: sql-injection
      patterns:
        - pattern: |
            $QUERY = "SELECT * FROM users WHERE name = '" + $INPUT
            database.execute($QUERY)            
      message: "Potential SQL injection vulnerability"
      severity: ERROR

Advanced SAST tools may implement deeper flow analysis to catch complex vulnerabilities across larger codebases.


Reporting Findings

After analysis, the tool generates a report detailing detected vulnerabilities, including:

  • Type of vulnerability (SQL injection, XSS, hardcoded secrets, etc.)
  • Location in code (file name, line number, function)
  • Severity ratings (low to critical)

High-quality tools provide remediation advice, code examples, and references to standards like OWASP and CWE. Integration into CI/CD pipelines and IDEs allows developers to receive immediate feedback, reinforcing secure coding practices in real time.