Welcome to the homepage of the Sourcerer Project!

Sourcerer is an ongoing research project at the University of California, Irvine aimed at exploring the open source phenomenon through the use of code analysis. The open source movement has generated an extremely large body of code, which presents a tremendous opportunity to software engineering researchers. Not only do we leverage this code for our own research, but we provide the open source Sourcerer Infrastructure for other researchers to use.

Sourcerer Infrastructure

The Sourcerer Infrastructure is composed of a number of layers.

Core Infrastructure

At the lowest layer, the core infrastructure is a set of Java tools for crawling, downloading, processing and indexing open source Java projects. It is available on GitHub under a GNU GPL license.


While the core infrastructure allows one to automatically crawl and download open source projects, we also provide our current repository to anyone interested. More information can be found on the repository page.

Sourcerer DB

Once Sourcerer's repository is constructed, we populate Sourcerer DB, a relational database, with structure and reference information extracted from the project source code. The services that the Sourcerer Infrastructure provides, including the code search service, are all built on top of Sourcerer DB. We provide direct read-only access to Sourcerer DB to those that are interested.


The Sourcerer Infrastructure provides a number of higher-level services that researchers or application designers can leverage to create rich next-generation search applications. These services include repository exploration, code search and dependency slicing.

Code Search Engine

The first application we built using the Sourcerer Infrastructure was a code search engine. A running version of the search engine can be found here.