Enterprise Engine

Our Enterprise Engine has been specifically designed to facilitate the development of enterprise solutions such as cloud or in-house services. Our engine offers the same SDK as our Cerbero Suite Advanced and has already analyzed billions of files.

Security

Our engine has been designed taking into account any type of security issue when analyzing malicious files: buffer overflows, integer overflows, infinite loops, infinite recursion, decompression bombs, denial-of-service etc. It has successfully and safely analyzed billions of files throughout the years.

Cross-Platform

Like Cerbero Suite our Enterprise Engine is cross-platform. Currently we offer our Enterprise Engine solution for both Windows (x86, x64) and Linux (x64). It is also compatible with older version of Windows (e.g. XP) and Linux (e.g. Ubuntu 12).

Documentation

We provide in-depth documentation for our SDK. The documentation not only covers the API, but details key concepts and includes many code examples.

Embedding

Our Enterprise Engine is deployed as an embeddable module: a Dynamic-Link Library on Windows and a Shared Library on Linux. It’s possible to embed the engine from both C/C++ and Python 3.

Embedding the engine from Python is extremely simple:

from ProEngine import *

# initialize the engine
proEngineInit()

# from now on the SDK can be accessed
from Pro.PDF import *
# ...

# finalize the engine before exiting
proEngineFinal()

Embedding the engine from C/C++ is also very simple: it only requires including the ProEngine header and specifying the location of the engine on disk:

#define PRO_ENGINE_INIT
#include "ProEngine.h"

int main()
{
    /* initialize the engine */
    if (!proEngineInit("/path/to/the/engine", ProEngine_InitPython))
        return -1;

    /* from now on the SDK can be accessed */

    /* finalize the engine before exiting */
    proEngineFinal();
    return 0;
}

Simplicity

Our SDK is not only intuitive, but also flexible. For example, extracting JavaScript code from a PDF document can be achieved by creating a hook extension. The following code snippet is all that is necessary to extract JavaScript code from any PDF document, even if encrypted or contained in an archive.

from Pro.Core import *

def printJSEntry(sp, xml, tnode):
    # data node
    dnode = xml.findChild(tnode, "d")
    if not dnode:
        return
    # we let Cerbero extract the JavaScript for us
    params = NTStringVariantHash()
    params.insert("op", "js")
    idnode = xml.findChild(dnode, "id")
    if idnode:
        params.insert("id", int(xml.value(idnode), 16))
    ridnode = xml.findChild(dnode, "rid")
    if idnode:
        params.insert("rid", int(xml.value(ridnode), 16))
    js = sp.customOperation(params)
    # print out the JavaScript
    print("JS CODE")
    print("-------")
    print(js)

def pdfExtractJS(sp, ud):
    xml = sp.getReportXML()
    # object node
    onode = xml.findChild(None, "o")
    if onode:
        # scan node
        snode = xml.findChild(onode, "s")
        if snode:
            # enumerate scan entries
            tchild = xml.firstChild(snode)
            while tchild:
                if xml.name(tchild) == "t":
                    # type attribute
                    tattr = xml.findAttribute(tchild, "t")
                    # check if it's a JavaScript entry
                    if tattr and int(xml.value(tattr)) == CT_JavaScript:
                        printJSEntry(sp, xml, tchild)
                tchild = xml.nextSibling(tchild)

By using hooks and other extensions our scanning engine does most of the heavy lifting!

Alternatively, our SDK can be used to perform custom parsing operations, without relying on the scanning engine. The following code snippet parses a PDF document and for every PDF object it prints out the ID, its dictionary and the decoded stream content. The code even contains logic to detect unreferenced objects, which can be found in corrupted or malicious PDF documents, and decrypts encrypted PDF documents.

# iterate through all the objects in a PDF
from ProEngine import *

def parsePDF(fname):
    # open the file
    c = createContainerFromFile(fname)
    if c.isNull():
        print("error: couldn't open file")
        return
    # load the file as PDF
    pdf = PDFObject()
    if not pdf.Load(c):
        print("error: invalid file format")
        return
    # parse all referenced objects
    objtable = pdf.BuildObjectTable()
    # detect unreferenced objects 
    # (corrupted or malicious PDFs may contain them)
    pdf.DetectObjects(objtable)
    # store the object table internally
    pdf.SetObjectTable(objtable)
    # process PDF encryption
    if not pdf.ProcessEncryption():
        print("warning: couldn't decrypt file")
    # [optional] sort objects by ID
    oids = []
    it = objtable.iterator()
    while it.hasNext():
        oid, _ = it.next()
        oids.append(oid)
    oids.sort()
    # iterate through the objects
    for oid in oids:
        # print out the object id
        print("\nOBJECT ID:", oid >> 32, "\n")
        # parse the object
        ret, dictn, content, info = pdf.ParseObject(objtable, oid)
        if not ret:
            print("warning: couldn't parse object %d" % (oid,))
            continue
        # print out the object dictionary
        it = dictn.iterator()
        while it.hasNext():
            k, v = it.next()
            print("   ", k, "-", v)
        # print out the decoded object stream
        content = pdf.DecodeObjectStream(content, dictn, oid)
        if not content:
            continue
        out = NTTextBuffer()
        out.printHex(content)
        print("\n", out.buffer)
            
if __name__ == "__main__":
    import sys
    proEngineInit()
    from Pro.Core import *
    from Pro.PDF import *
    parsePDF(sys.argv[1])
    proEngineFinal()

This is the output of a single PDF object:

OBJECT ID: 50 

    /Length - 371
    /Filter - /FlateDecode

         0  1  2  3  4  5  6  7   8  9  A  B  C  D  E  F    Ascii

0000   2F 43 49 44 49 6E 69 74  2F 50 72 6F 63 53 65 74    /CIDInit/ProcSet
0010   20 66 69 6E 64 72 65 73  6F 75 72 63 65 20 62 65     findresource be
0020   67 69 6E 0A 31 32 20 64  69 63 74 20 62 65 67 69    gin.12 dict begi
0030   6E 0A 62 65 67 69 6E 63  6D 61 70 0A 2F 43 49 44    n.begincmap./CID
0040   53 79 73 74 65 6D 49 6E  66 6F 3C 3C 0A 2F 52 65    SystemInfo<<./Re
0050   67 69 73 74 72 79 20 28  41 64 6F 62 65 29 0A 2F    gistry (Adobe)./
0060   4F 72 64 65 72 69 6E 67  20 28 55 43 53 29 0A 2F    Ordering (UCS)./
0070   53 75 70 70 6C 65 6D 65  6E 74 20 30 0A 3E 3E 20    Supplement 0.>> 
...

The following code snippet scans all the files contained in a specified directory and its sub-directories:

import os
from ProEngine import *

def scanFiles(scandir):
    rootdir = os.path.realpath(scandir)
    for dirname, _, filelist in os.walk(rootdir):
        print("Scanning directory: %s" % dirname)
        for fname in filelist:
            fullpath = os.path.join(dirname, fname)
            print('\tScanning %s... ' % fname, end ="")
            r = proEngineScanFile(fullpath, return_report=True)
            print("OK" if r else "FAIL")
            
if __name__ == "__main__":
    proEngineInit()
    scanFiles("/path/to/scan")
    proEngineFinal()

From C/C++ it’s possible to execute Python scripts, evaluate Python code, retrieve Python variables and scan files. In the following code snippet a global variable is set from Python and then retrieved from C.

#define PRO_ENGINE_INIT
#include "ProEngine.h"

int main()
{
    /* initialize the engine */
    if (!proEngineInit("/path/to/the/engine", ProEngine_InitPython))
        return -1;

    /* set a variable in Python and retrieve it from C */
    proEngineEvalPythonCode("some_value ='kawaii'");

    char *r = proEngineGetPythonGlobalVariable("some_value");

    if (r != NULL && strcmp(r, "kawaii") == 0)
        puts("SUCCESS");
    else
        puts("FAIL");

    proEngineFreeMemory(r);

    /* finalize the engine before exiting */
    proEngineFinal();
    return 0;
}

Our SDK makes it trivial to pass variables between Python and C/C++.

Speed

Although our SDK is in Python, our engine is written in C++ and is multi-threaded. This design decision guarantees maximum speed, while also giving our customers the capability to write cross-platform code, which is also compatible across different versions of the engine.

Stability

Since our SDK is in Python, our customers don’t need to worry about rebuilding their project when the engine is updated. Not only that, but we take great care not to introduce breaking changes to our SDK: we don’t want our customers to worry about their code not working anymore when updating the engine.

Range

Our SDK is vast. It features support for dozens of file formats, disassembly, decompiling, emulation, file scanning, signature matching, file carving, decompression, decryption and much more.

If you’re in doubt whether our engine fits your use-case, you can contact us for a free consultation!

Quality

We make sure our engine keeps up with the latest threats and challenges presented by file formats which are difficult to analyze. We offer state-of-the-art support for various file types such as Adobe PDF and Microsoft Office.

Flexible Licensing

Our Enterprise Engine is licensed on a per-customer basis. The licensing depends upon the scope of the project. If you are interested in a quotation, please get in touch with us.

Suite Discounts

Purchasing a license of our Enterprise Engine comes with discounted lab licenses for Cerbero Suite!

By using Cerbero Suite, engineers can interactively debug parsing issues, analyze edge cases, use our Python editor and create graphical applications that work together with the engine.

Priority Support

Our Enterprise Engine customers have access to our priority support. We understand the criticality of enterprise applications and hence provide the fastest support for our engine.