Data Extraction Tools
The data extraction toolkit currently consists of 4 products:
Many pdf documents are created without bookmarks. tick TOC bookmarks every page referenced by a document's contents page, making it possible to jump directly to any page highlighted from anywhere in the document. tick TOC can work with mulitple contents pages, whether they are hierarchical or not. tick TOC will work with any document that has a contents page.
DOC key is an indexing tool for documents. Standardised indexes for a document can be created at the touch of a button using either a web or desktop interface.
DOC key uses a different approach to other indexing tools. Rather than catalogue a document, it works with the document's inherent structure to produce a tighter, highly relevant index of key content. DOC key creates an incredibly accurate single hit index by identifying the most relevant section based on your selection criteria. This is achieved by isolating a section’s specific attributes that make it unique within the document model (X Layer). The document model is extensible, allowing selection criteria to be constantly refined to expand the range and type of indexing. See the X Layer for more details.
By working with a document's inherent structure, we are able to achieve unparralled levels of accuracy. To this end, DOC key uses 4 seperate intelligent processing engines to isolate the correct result.
DOC key cuts down the time taken to extract data directly from reports. Analysts who require the original source, will benefit from faster modelling of company financial data along with data vendors engaged in the mass extraction of financial data from source documents.
Other benefits include faster verification of modelled data against the original source document and the capability to restrict a search to specific sections across multiple documents.
DOC key was originally designed to work with financial reports but could work with any set of documents that contain a common set of key data. As with financial reports, the data set does not have to be standardised, but there would usually need to be some commonality of purpose behind the documents, around which we could build a standardised Index.
You can try out DOC key for yourself. Please contact us to arrange a trial.
2 Source is an Excel Add In that enables the user to step back through any financial ratio calculation to the underlying numbers in the source documents.
The X Layer provides access to the unique document model utilised by tick TOC and DOC key. In structuring the document content, data anomalies and formatting issues are addressed, before two comprehensive & integrated models of the underlying textual and numerical data are created. As a result, the X Layer has a very strong understanding of the numerical relationships within a document.
The X Layer is therefore well suited to the extraction and indexing of numerical data into standardised tables. The numerical model is not just restricted to tables, also making the X Layer particularly useful when extracting numerical data from lumps of text.