
Intelligence from large texts:
transforming
unstructured data
Text Analytics article
In practice, the two approaches may not be so well
differentiated
because the enterprise queries may be “fuzzy” or
the query
system may be so fast that queries can just be thrown at the data until
interesting responses are generated. But the transformation metaphor
captures what is new and exciting about business intelligence and text
analytics in particular.
Text analytics involves the transformation of unstructured texts into
structured information. The structured information from an enterprise,
which is complex and essentially multi-dimensional, is typically
presented via data visualisation that is easily
understood by a user.
The user can then scan and analyse the structured visualization to pick
out and follow up on interesting patterns in the data.
To get to this stage of visual representation, sophisticated software
is needed to bring about the transformation of words in a text document
into a representation of relations in the texts. In fact, the texts
actually undergo several cascading transformations, all hidden from
view, in which structure is added based on the relations between parts
of the data, The newly structured text is then analysed further so that
more structure is added, and so on.
For example, the first step may well be to take the text and add
part-of-speech information for each word, which may be followed by
further linguistic analysis involving the identification of sentence
constituents or by statistical analyses of the text contents. To be
useful, these analyses must be matched with knowledge about a
particular domain: the named entities in the domain, the types of
entities, and their relations to each other. Who are the
people, the companies, and other important entities in the domain? What
kinds of relationship exist among the entities? This representation of
world knowledge may be provided by a specific ontology,
or it may rely on information in the texts themselves.
In essence, text analytics exploits subtle regularities in texts. These
regularities en masse reveal very useful information.
If you have read this far and are thinking about using text analytic
solutions in your organization, then your next step will be to work
with a company offering these services to see whether a configuration
consisting of your data (structured and unstructured), your software
systems, and the software provided by text analytics company can yield
the kind of information you want.