Why ICA is the real ECM

ICA will provide the enterprise with a rich field of information that can be used to generate predictions which in the past we never considered possible

brain circuit head mind electric
Thinkstock

Many years ago, medical companies struggled with how to manage, report, and provide drug development information to regulators. They were required to provide all details on clinical trials, drug investigations, adverse effects, doctor’s notes, and patient records in a consistent format for review. However, the information was held in many different formats with Dictaphone tapes, X-ray images, and scanned documents and images all being part of the information under management.

The solution is what was granularly called enterprise content management (ECM). Enterprise content management was supposed to encompass all types of content under one umbrella for management and distribution. Its drawback was that it couldn’t understand that content. Any descriptive information or analysis was added by users, in the main part, or via workflow processes, which would, where possible, attempt to extract relevant information. In the case of the medical image files, a system might look to extract information via the DICOM format and metadata. However, none of this was analytics; it was static and, in most cases once entered into the system, stale.

ECM systems encompass many different applications, like web content management, document management, and image management systems. Each system can also have a records management method or module, but none of the systems actively integrate the content to do content analytics. Some ECM vendors are looking to add modules—that is to say, they bolt on systems—to address some of the shortcomings of legacy ECM; however, those systems will always be limited by the architecture of the original applications.

Now fast-forward to today, where content is predominantly generated in image, video, and picture formats. You can see that manually adding or looking only to system-generated meta-information is not enough to gain insight into the data you hold. It is estimated that only 0.5 percent of all stored information is used to generate analytical projections or data that systems can learn from. Think for a moment how long it would take children to learn something if they could only take in 0.5 percent of all the information they were presented.

It’s therefore necessary to come up with a system that operates in the same way as a human does. Children learn fastest and retain the most knowledge when all their senses are used to see, listen, taste, and feel. While a computer system might currently be unable to taste or feel, it is able to listen, see, and read. Therefore, intelligent content analytics (ICA) is important, because it allows systems to use the 99.5 percent of untapped information and to use it in the strongest and fastest possible way.

Haven’t we heard this all before? Autonomy came up with the Idol platform that did everything that has been suggested for ICA. In short, that was a marketing- and PR-based ICA solution. Idol was never able to process the information in the same way because it did not have the advances required to make it happen. The custom hardware that is available today, with the GPUs, TPUs, and specialized learning processors, has allowed algorithms that were impossible to use at any scale in the past to become universal.

Those advances, and the vast amount of information being generated within easily available locations, cloud storage and services, have allowed companies such as Microsoft, Amazon, Facebook, and Google to train and perfect methods to translate language and pictures into meaning and system-understandable content.

Microsoft and Google indicated at the start of 2017 that the Word error rate for speech to text was at 6.3 percent. Then, within months, it was improved again, to below 5 percent.

Image recognition and then subsequent descriptive generation is further behind, but it will improve and undoubtedly catch up. For an example of the importance of the error-rate number, Google has just released its translation earphones, which translate 40 languages in near real-time. Without the new hardware advances I mentioned, this would not have been possible. That is why we can state, with total accuracy, that Idol was not an ICA solution.

What does this mean for you?

Perhaps you are thinking that all this isnice to know, but how will this all benefit you or your business? To answer this, let’s look at some use cases:

Car safety

We have already seen self-driving cars, with Tesla and others developing and making advances in autonomous driving. Those systems already use video, sensors, and pictures, but what they do not use is sound. The systems of the future will be able to pick out sounds like the screeching of car tires, which may indicate to a driver that an accident or traffic incident is ahead. Intelligent systems will listen for voices or noises of animals to distinguish among seen items, and will learn what they are and how to avoid them in seconds. This type of combined ICA and learning is not far away.

Medical diagnosis

To effectively treat a patient, a doctor already uses sounds, touch, pictures, and even patent feelings conveyed via verbal communication, in addition to diagnostics such as internal scans, ultrasound, and MRI. The doctor needs to assess all those items to provide a diagnosis of the potential cause and to prescribe a remedy. With advances in ICA, a system would be able to continually take all the available feeds, listen to the patent, and then transcribe this to derive a meaning and words that can be learned from and applied. And as with the car, such a system would provide a rapid assessment and a suggested course of action.

How will this benefit enterprise users?

If you look past those examples, how specifically will this benefit enterprise users? The initial benefit will be the ability to use some of the 99.5 percent of untapped information they hold to learn from, and potentially uncover risk or rewards.

However, the largest and most important advance of all will be with autogeneration of data for learning. One significant advance is natural language generation (NLG). It is this process that can be used to normalize data elements, such as pictures, videos, and audio, to a standard and consistent stream of information from which algorithms can be taught to build models in much the same way as people learn.

The human brain does an amazing job of normalizing all the sensors we have into simple impulses, voltages, or chemical reactions. Just as the human brain normalizes words with images, reinforcing its own ability to associate and learn, ICA will provide the enterprise with a rich field of information that can be used to generate predictions which in the past was never considered possible.

This article is published as part of the IDG Contributor Network. Want to Join?

Related: