Intelligence from large texts:
transforming unstructured data

Text Analytics

Text Analytics article

In practice, the two approaches may not be so well differentiated because the enterprise queries may be “fuzzy” or the query system may be so fast that queries can just be thrown at the data until interesting responses are generated. But the transformation metaphor captures what is new and exciting about business intelligence and text analytics in particular.

Text analytics involves the transformation of unstructured texts into structured information. The structured information from an enterprise, which is complex and essentially multi-dimensional, is typically presented via data visualisation that is easily understood by a user. The user can then scan and analyse the structured visualization to pick out and follow up on interesting patterns in the data.

To get to this stage of visual representation, sophisticated software is needed to bring about the transformation of words in a text document into a representation of relations in the texts. In fact, the texts actually undergo several cascading transformations, all hidden from view, in which structure is added based on the relations between parts of the data, The newly structured text is then analysed further so that more structure is added, and so on.

For example, the first step may well be to take the text and add part-of-speech information for each word, which may be followed by further linguistic analysis involving the identification of sentence constituents or by statistical analyses of the text contents. To be useful, these analyses must be matched with knowledge about a particular domain: the named entities in the domain, the types of entities, and their relations to each other.  Who are the people, the companies, and other important entities in the domain? What kinds of relationship exist among the entities? This representation of world knowledge may be provided by a specific ontology, or it may rely on information in the texts themselves. 

In essence, text analytics exploits subtle regularities in texts. These regularities en masse reveal very useful information. 

If you have read this far and are thinking about using text analytic solutions in your organization, then your next step will be to work with a company offering these services to see whether a configuration consisting of your data (structured and unstructured), your software systems, and the software provided by text analytics company can yield the kind of information you want.

^ Top | © 2006 Michael Barlow | css | xhtml | dvd