AnyDocx engine supports variety of input documents. The process retrieving will let the interface talk to a variety of input types like word, pdf, images, xml, etc… from various sources like fax, billing systems, custom feed or even from a custom data warehouse implementation.
The quality of the documents received are converted to light weight high quality images by refining using various image processing algorithms like thresholding, Gaussian filter, and Canny edge detection etc… The said refined output called as partdata is stored on the cloud for future uses and training the system to identify the common patterns specific to an organization.
Partdata is processed to collect various types of metadata like image quality metrics, size, source etc… to be used for data analytics streamlining. In this stage we also define templates for reading the partdata and assign them to a certain patterns of part data.
Partdata is then fed into the data extraction engine which uses various data extraction techniques like OCR extraction, phonetic correction, sentence and word correction etc… The collected crude data is then refined into meaningful data that will be used for constructing the big data analytics.
Data collected in the previous step is then used to define basic structures around it so that it will be placed properly in the big data setup. Basing on the metadata for the data, it will be choosing proper setup of big data.
The structured data is then fed into the dashboard of the clients using the custom reports built specifically for the organization barring the default report setup that already comes with AnyDocx.
This feature also exposes certain set of data as web services based on the clients’ requirements. This will be useful for integration into the other systems like Salesforce, any ERP implementations like SAP, Oracle EPM, EDI etc…