DOCUMENT PROCESSING SOLUTIONS


EXTRACT DATA > AUTOMATIC VALIDATION > FRAUD DETECTION





DOCUMENT PROCESSING SOLUTIONS

We collaborate with organizations analyzing their technologies and processes to design and implement solutions for the processing and automatic verification of documents data extraction to feed different systems and the development of fraud prevention and detection mechanisms using the most innovative methodologies which translate into operational, financial gains that contribute to an increase in quality and efficiency in organizational processes.
In this context, we develop solutions for the processing and automatic verification of documents according to the specific needs of organizations in different business areas (eg Banking, Insurance, Utilities, Public Administration, Private, etc.).



TECHNOLOGY

The complexity of our customers needs requires a different approach opposite to traditionally used by the IT industry. Current solutions for mass digitization and document management by large organizations are still far from their perspectives due to their lack of robustness and the great need for human interaction so that business processes can evolve successfully. This innovative offer of value-added services is based on three pillars, Machine Learning, Print & Scan Channel and Optical Document Processing (ODP).


Machine Learning

It is an area of ​​artificial intelligence that aims to develop techniques that allow systems to learn from a set of data which can be used to detect patterns, groups (clusters), predict the future or to classify samples. For example historical credit data that may employ Machine Learning algorithms to detect patterns of fraudulent credit orders, and apply the model to help in the risk classification of each new application or use Machine Learning to correct errors after the OCR application for scanned documents.
There are several Machine Learning and SVM (Support Vector Machine) algorithms neural networks and logistic regression which are generally classified into two broad categories: Supervised Algorithms when the data provided to test the algorithms is formed by a set of samples and the class to which each sample belongs such as the data requested in the credit applications, classified as fraud or no fraud and Unsupervised Algorithms, those in which the data samples are not classified as belonging to any class.
The choice of a certain algorithm requires adequate knowledge taking into account the objective of the problem to be solved its variation with time the computational power or the volume of data that are available.


"Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed"
Arthur Samuel. 1959

Print & Scan Channel

The content of documents subject to scanning and printing processes are subject to great distortions that significantly hamper the implementation of automated document processing solutions. This binomial scanning and printing are known as the Print & Scan Channel, and it was modeled mathematically using advanced signal processing techniques in communications, where the printing device is the transmitter, the paper is the transmission medium and the scanner, tablet or smartphone the receiver.
The formulas that model the Print & Scan Channel are defined with a non-linear channel variant in time space and with a high level of noise, which mathematically relate an original document to the printed and scanned version. It is precisely this characterization that a priori defines the purely random behavior that the Print and Scan Channel presents which allows the team of engineers to offer differentiated and unique added value compared to traditional document processing services detecting signs of tampering in printed documents or improve the interpretation of the content of a scanned document.
There are several Machine Learning and SVM (Support Vector Machine) algorithms neural networks and logistic regression, which are generally classified into two broad categories: Supervised Algorithms, when the data provided to test the algorithms is formed by a set of samples and the class to which each sample belongs, such as the data requested in the credit applications classified as fraud or no fraud and Unsupervised Algorithms those in which the data samples are not classified as belonging to any class.
The choice of a certain algorithm requires adequate knowledge taking into account the objective of the problem to be solved its variation with time the computational power or the volume of data that are available.

Optical Document Processing

Optical Document Processing (ODP) encompasses all technologies applied for automatic processing of all types of digital documents from invoices to DNI's for automating processes in flows of document management organizations with the need for mass processing of large volumes of documents and compliance with deadlines. The main processes are:

  • OCR on scanned documents;

  • Automatic data search in generic documents and automatic data association to logical tags using Machine Learning techniques;

  • Detection of anomalies in the morphological structure of documents, combining traditional image processing techniques with Machine Learning techniques;

  • Anomaly detection in documents that were poorly standardized, combining OCR and Machine Learning techniques.




SERVICES

In this context we provide services with different specificities regarding the treatment of documentation and respond to various areas such as Public Administration, Finance - Banking - Insurance and Utilities and Documentary Expertise from which we now describe the services.


Public administration

The relationship between citizens and the Public Administration and its services are governed by the administrative procedure, which guarantees the principle of equality for all citizens before the Administration.
Public administration is a vast and complex reality. Traditionally Public Administration is understood in a double sense: organic and material. In the organic sense, public administration is the system of organs services and agents of the State and other public entities that aim at regular and continuous satisfaction of collective needs; in the material sense public administration is the very activity developed by those bodies services and agents. Considering its organic sense it is possible to distinguish in Public Administration three large groups of entities:

  • Direct State administration;

  • Indirect State administration;

  • Autonomous Administration.


A fundamental part of this communication process is the ability to streamline and improve document processes through the solutions we provide for this purpose. Today more than ever the need for efficiency and effectiveness in Public Administration in processes requires:

  • Cost Reduction;

  • Reduction in waiting times in processes;

  • Time savings in working hours;

  • Advance in interoperability and reuse of information and E-Government.

Financial | Banking | Insurance | Utilities

Currently, and despite the available telematic means the beginning of a contractual relationship between the customer and the supplier is based on the information contained in physical paper documents. The provision of services as well as the supply of consumer goods that require recourse to credit through a financial entity that facilitates credit to the customer a process that is carried out based on the quality and authenticity of the documentation received.
In order to avoid incurring unnecessary costs it is essential to assess the feasibility of the operation in real time so it is necessary to detect suspicious documents in the admission processes before they enter the risk analysis or authorization channel. To answered to these specific requirements we develop solutions for the automation of these processes, acting from the moment the information is received, detecting suspicious documents in real time validating identity documents and proof of address. At the same time, the automatic extraction of the data required to the applications is carried out and the documents are scanned, avoiding recurring costs of manual data storage, error correction, visual verification of the received documentation and losses due to fraud.
The most advanced knowledge of "Machine Learning" is used for these processes, mathematical models of the distortions produced by the processes of printing and digitizing documents (Print & Scan Channel) and "Optical Document Processing".

Documentary Expertise

Clarifying the authenticity and integrity of a printed document is often a key issue in litigation. Nowadays the usual practice is to use documentary techniques such as the use of ultraviolet or infrared light scanners techniques that are valid in security documents such as ID cards or passports which are less effective in normal documents such as contracts, faxed or proofed documents.
The evolution of information technologies has made it possible and has made it possible for everyone to change documents. Knowing, for example either a supposedly printed document or fax has not been tampered with is a difficult question to determine. Manipulation can be done simply using an image editor such as Photoshop or Gimp without leaving any visually detectable traces.
Often these types of fraudulent changes leave a trail that through advanced image processing techniques specialized knowledge of the Print and Scan Channel and "Machine Learning" algorithms can be detected. We provide highly specialized knowledge for the preparation of expert reports that can contribute to a better understanding of the authenticity and integrity of documents.

Contact us to schedule a meeting