Code Property Graph (CPG)

Multi-Layered Graphical Code Analysis Drives Unrivaled Accuracy, Speed, and Scale

Code Property Graph

ShiftLeft is not only our name; it embodies our philosophical approach to the products we deliver to the market. All ShiftLeft solutions leverage the ShiftLeft Code Property Graph (CPG). The CPG provides an extensible, multi-layered representation of each unique code version, including the various levels of abstraction—all presented in a single graph.

The CPG is essentially a “graph of graphs”, depicting control flow graphs, call graphs, program dependency graphs, and directory structures. The CPG creates a multi-layered, three-dimensional representation of the code, with unprecedented insights that enable code developers and analysts to fundamentally understand what each version of their application is able to perform and not perform and any scenarios that may pose a risk. CPG details include the representation of raw programming language, custom codes, and open-source elements (OSS libraries, SDKs, and platforms), including different classes, methods, fields, and abstract syntax trees, all merged into a joint data structure for analysis.

The unique insights from the CPG provide all ShiftLeft solutions with granular detail and a deep understanding of data flows. CPG mapping includes traversals within the various layers of code to rapidly identify sources of data leakage, critical vulnerabilities, and security/compliance violations early and across the entire software development lifecycle (SDLC)—from development to production. With this optimized view, both false positives and false negatives are greatly reduced, providing accuracy at an unprecedented speed and scale.

Invented by Dr. Fabian Yamaguchi, Chief Scientist of ShiftLeft, the ShiftLeft CPG extends the original open source project, Joern, to providing a feature-rich and enterprise-grade experience across multiple programming languages. As a measure of the effectiveness of the approach, Joern was used to identify 18 vulnerabilities in the Linux kernel that were accepted and fixed. If the CPG can find 18 real vulnerabilities in one of the most commonly used and hardened code bases, imagine what it can do for your source code!

Multi-Layered semantic graph, rapidly connects the vulnerability dots

The CPG is a graph of graphs. It takes the graphs you have named and merges them into a joint data structure. The result is a multi-layered, three-dimensional representation of code that is richer and more comprehensive than anything else in the market. It delivers unprecedented code insight and enables the CPG to display information flows in the context of what the application fundamentally is and is not supposed to do. From this perspective, vulnerabilities become easier to identify as anomalies. The CPG leverages semantic graphing to create a single multi-layered graph, thereby summarizing all the components and flows in each version of the code. The CPG maps all of the code elements (custom code, open-source libraries, and commercial SDKs) into various levels of abstraction, including abstract syntax trees, control flow graphs, call graphs, program dependency graphs, and directory structures. This joint data structure provides a much deeper understanding of how the various components interact with each other. This understanding also enables a more effective analysis of the code for the identification of vulnerabilities. This is especially effective for identifying complex vulnerabilities made up of a series of conditions and across a number of components in the code that are simply impossible to discover using legacy SAST tools.

Multi-Layered Semantic Graph, Rapidly Connects the Vulnerability Dots

Enabling lightning-fast scan times

Analyze up to 500,000 lines of code in less than 10 minutes to keep up with the needs of DevOps pipelines. This gives you the agility and confidence you need to analyze and secure every code release and launch your code! No need to skip analysis cycles, and hope for the best.

Record-breaking accuracy

The CPG has a high level of accuracy in finding vulnerabilities, whereas others continue to fail. By graphically mapping all of your unique code elements, including custom code, open-source libraries, and commercial SDKs, into various levels of abstraction, the CPG provides a much deeper understanding of how the various components of your code connect and interact with each other, quickly identifying areas of risk and vulnerability. The code analysis accuracy driven by the CPG was recently validated in the OWASP Benchmark, where ShiftLeft Inspect set the record with a score of 75%. Not only was this the highest SAST score ever recorded, but it doubled the score of the next closest commercial SAST vendor and nearly tripled the commercial average. The CPG’s accuracy was demonstrated and verified at the 2014 IEEE Symposium on Security and Privacy, where the unique insights provided by the CPG enabled the discovery of 18 previously unknown vulnerabilities in the source code of the Linux kernel.

OWASP SAST benchmark

Record-Breaking Accuracy
Your Unique Code and All Its Dependencies

Your unique code and all its dependencies

The CPG provides a single-pane-of-glass view of the unique elements that make up your entire application, including your custom code, OSS libraries, frameworks, and commercial SDKs. This perspective enables code analyses with context and greatly reduces the number of false positives. For example, the mere presence of a CVE may trigger a legacy SAST tool to call it out as a vulnerability requiring follow-up, whereas the CPG is able to inform you as to whether the CVE is being used in a vulnerable manner. If it is not, the CVE is removed from the scope of further efforts.

Evidence of the value of this unique approach was shown in May 2018, when the CPG enabled ShiftLeft to discover the Jackson-databind deserialization vulnerability in the Nexmo 3.4.0 SDK (ShiftLeft Jackson-databind CVE Announcement).

Your Unique Code and All Its Dependencies

Sensitive data flow & data leakage mapping

As applications have become more modular with microservices architecture, open source libraries, commercial SDKs, and external APIs, mapping data flows across sources, transforms, and sinks have become far more complex. ShiftLeft’s CPG uses natural language processing (NLP), with industry-specific dictionaries to automatically identify critical data by variable name. Thus, the CPG can definitely identify leakages, such as critical data being logged into Splunk in clear text or credentials inadvertently being stored in a public GitHub repository, in development before they get pushed to production.

Common examples of data leakage often missed by traditional code analysis methods and tools but exposed by the CPG include the following critical data being inadvertently logged in using clear text or the hard-coding of credentials into the application.

You can experience these insights into your own unique code with a no-obligation, complementary ShiftLeft Data Leakage Assessment.

Sensitive Data Flow & Data Leakage Mapping
Integration into DevOps Pipelines

Integration into DevOps pipelines

The CPG provides the foundation for continuous, accurate, and rapid feedback across Dev, Sec, and Ops teams and from end to end in the SDLC. Automation of policies can be achieved through ShiftLeft’s seamless integration at several points in your deployment pipeline, depending on your needs: pull request, code commit, or during the build process. This is made easier by integrations with various code integration and deployment tools.

into existing CI/CD tools, such as Jenkins, Circle CI, Travis, and Bamboo. This enables your teams to identify, address, and resolve issues earlier and with greater speed, precision, and efficiency.

Integration into DevOps Pipelines

Go beyond vulnerabilities to identify code weakness

While most code analysis tools are strictly focused on finding vulnerabilities, the CPG understands how information flows across your application, and can also be used to identify the presence of code weakness that may be impacting performance and efficiency, such as methods with too many parameters, improperly sanitized inputs, duplicate code, and inconsistent naming conventions that may be present in your code’s construction.

Free data leakage assessment

Map critical data across sources, transforms & sinks

An Open-Source Schema

An open-source schema

ShiftLeft has developed, and continues to provide support for a number of open-source projects and resources. These are available to address the needs of our growing community of software professionals. For example, the CPG schema is available as open source, and published as a suggestion for an open standard for the exchange of code in intermediate representations along with analysis results.

An Open-Source Schema