Clarpse - The Way Source Code Was Meant To Be Analyzed

Build Status codecov Codacy Badge License PRs Welcome
Clarpse is a multi-language source code analysis tool designed for extracting deep relationships between entities in a codebase through a clean API. Clarpse makes developer tools like code search and static analyzers better. It supports the development of features like jump to definition, find usages, type inference, and documentation generation.

The Problem

Existing tools for parsing code like the Eclipse JDT AST Parser, JavaParser and Roaster to name a few for Java simply provide information from generated AST’s through Node visitors and event listeners. They are great at telling you about properties of an individual source file. However, if your requirements involve understanding components in the context of the entire codebase in order to support complex functionality, you would typically need to develop your own solution. Additionally, because there are no standard cross-language and cross-editor APIs and data formats, it is difficult to reuse your work for more than one programming language.

Let Clarpse Do The Hard Work

Clarpse acts as a layer infront of generated AST’s to populage language agnostic source code models with all type and symbol information resolved at parse time. It exposes a clean, easy to use API and outputs data using a well defined, language independant format. All external tools built on top of Clarpse only need to consume this data format in order to support all the languages Clarpse supports!

Clean Object Oriented API

Clarpse allows developers to analyze the OOP properties of code using a clean, object oriented API. First, we generate an OOPSourceCodeModel object from a list of source files.

ClarpseProject project = new ClarpseProject(new SourceFiles(new RawFile("file.java",
"package foo.package;
/** Doc Comment */
@Deprecated
public interface Bicycle {
   void changeGear(Gear gear);
}"), Lang.JAVA));
 OOPSourceCodeModel model = project.result();

The OOPSourceCodeModel is a polygot representation of a code base. It is essentially a collection of Component objects that represent various units of Object Oriented systems (class/struct, method/function, field, etc..).

 Component interfaceComponent = model.get("foo.package.Bicycle");
 // retrieve properties of the interface component...
 interfaceComponent.annotations(); // --> {"Deprecated"}
 interfaceComponent.uniqueName();  // --> {"foo.package.Bicycle"}
 interfaceComponent.sourceFile();  // --> {"file.java}
 interfaceComponent.comment();     // --> {" Doc Comment"}

Most importantly however, we can see how components are related to each other through their ComponentInvocations, which is a collection of external Component objects the current Component depends on.

interfaceComponent.children().get(0).uniqueName();  // -->  "Bicycle.changeGear(Gear)"
interfaceComponent.children().get(0).invocations().get(0);  // --> TYPE_DECLARATION
interfaceComponent.children().get(0).invocations().get(0).invokedComponent();  // --> "foo.package.Gear"

Polyglot

No matter what programming language is being parsed (currently support Java and Go), Clarpse will always generate the same object oriented data model that abstracts away any language specific details. Clarpse is designed to be modular and extensible, new languages can be added with little effort.

Get involved

We are currently fixing bugs and adding support for more languages. Help us out by using Clarpse and sending us feedback, spreading the word, or contributing code!

If you enjoyed this post, subscribe to my mailing list to be notified of future posts, and kindly share on Twitter .

Muntazir Fadhel


Muntazir Fadhel is an Infrastructure Engineer with over 5 years of experience in developing and deploying distributed architectures including microservices and machine learning pipelines. He's pacticularly interested in software architecture and deployment, especially as they relate to cloud environments.

SHARE ON

comments powered by Disqus