
Intelligence from large texts:
transforming
unstructured data
Reuters release open API for the Calais Web ServiceThe Calais Web service enables publishers and other text workers to automatically metatag the people, places, facts and events in their content. The aim is to increase its search relevance and accessibility on the Web.
Lexalytics releases Salience 4.0 Lexalytics, Inc. (www.lexalytics.com) has released Salience 4.0. This latest version of the software builds in significant improvements to entity extraction, sentiment scoring and thematic extraction, in addition to several other new features.
Attensity: Advances in Text Analytics; Text Analytics for Insurance Early Warning and DetectionOn demand webinars
Beyond Buzzword Bingo: Discover the Real Business Value of Search and Intelligence. Archived at Inxight.
Applying Text Analytics Solutions for Effective Claims Analysis. Attensity
Google/Inxight webinar. June 20. Slides available from Inxight.
The 4th Annual Text Analytics Summit was held June 22-23 in Boston. Some info from the conference is available.
6 AUGUST 2006
Data crunched by companies and government agencies is
typically quantitative. These numbers are manipulated within relational
databases to yield useful information. However, the intelligence
potentially available to organizations is much larger than what is
garnered from these traditional sources. Note the phrase
“potentially available”. How do we get access to
this vast potential resource? The problem is that useful business
intelligence is buried within large amounts of text data, such as
company documents, emails, customer survey reports, and so on. Text
documents are structured for reading by people, but they are
unstructured as far as data extraction is concerned. The
essence of text analytics is to take
very large unstructured text documents and extract useful business
intelligence.
Before examining text analytics in more detail, let’s
consider a range of ways to extract data from large texts. We can
distinguish two broad approaches: queries and transformations.
Queries. One
way to extract information from large texts is to formulate a query.
Once a query is specified, software routines trawl through the text to
provide a response to the query. An example of a response may be
something as simple as a list of all instances of the words
“IBM” and “UIMA” that occur
within a certain span of words, say strings of 10 words or so. The
queries and the responses may be more complex than this, but what
characterizes a query is the obvious fact that you have to specify the
query. In order to formulate a good query, you have to know what you
want to know, and then from that decide how to structure the query,
following the constraints of the query system software, to obtain the
desired results. You have to decide what you want to know, and you have
to make assumptions about the kind of information contained in the text
documents.
Transformations.
A query can be considered to be a request to reveal specified data
patterns hidden within a text. An alternative way to deal with texts is
to give a request along the lines of: “transform yourself to
reveal interesting data patterns”. A simple example of this
notion might be a request for a summary of a document.
Following this transformation metaphor, the summarization software can
be viewed as a request to a document to transform itself into a
summary.
Both queries and transformations are useful and have their place. One
interesting aspect of a transformation approach is that few assumptions
are made about the content of the data patterns in the texts. if you
want a broad picture of the content of texts, then in adopting a
transformational approach, you are giving the data patterns a chance to
reveal themselves. If, on the other hand, you know
you want to find out about IBM and UIMA, then a query is the right way
to go. You know what you are looking for and you know which entities
are relevant. Read
more
6 AUGUST 2006
Finding the best reviewers for particular grant applications (pdf) Content Analyst