Tuesday, January 20, 2009

Jurafsky & Martin's Speech and Language Processing

Currently, I am taking a "Computation and Linguistic" analysis class here at IU Linguistics with Prof. Sandra Kuebler. we are using Jurafsky & Martin's (2009) Speech and Language Processing--2nd edition. The book seems really useful, and, from the very first sentence, authors motivate their readers. That sentence runs as follows: "This is an exciting time to be working in speech and language processing". In their introductory chapter--chapter 1--the authors start by spelling out the various terms used to refer to the field of "natural language processing" (NLP), mentioning terms like "speech and language processing", "human language technology", "computational linguistics", and "speech recognition and synthesis". People working within this field of study would, I believe, agree that not all these terms refer to strictly the same thing. But I think everybody will agree that there is a great deal of overlap. For me, this multidisciplinary nature of the field is among the first reasons that make NLP exciting.

Authors then move on to give examples of the subfileds/tasks that make use of NLP like "conversational agent", "dialogue systems", "machine translation", and "question answering". Then the authors briefly mention levels of linguistic knowledge (e.g., phonetics, phonology, morphology, semantics,pragmatics, and discourse) needed for speech and language processing, stating that it is this sort of linguistic knowledge that distinguishes language processing applications from other data processing.

Authors then use the concept of ambiguity to package the mission statement of NLP. They mention that most NLP tasks can be seen as attempts of resolving ambiguity at one or more of the linguistic levels. They also provide example of what ambiguities need to be solved. These incorporate talking about "part of speech tagging", "word sense disambiguation", "lexical and syntactic disambiguation", "probabilistic parsing", and "speech act interpretation".

Models and algorisms are also presented as ways of resolving ambiguities. The main models and algorisms within the field are mentioned, with a promise of more on them later. Models and algorisms are classified into three types: State machines, formal rule systems, and models based on logic. Examples of each model are provided.

The authors then by briefly discussing the early hope, or perhaps for some people promise, of compuational linguistics: Computers will be able to process natural language as intelligently as we humans do. They talk about Alan Turing's (1950) paper and the Turing test, and then about ELIZA, the early NLP system (see Weizenbaum, 1966).

The chapter ends with a "state of the art" interesting section, where current trends in computational linguistics are situated within a chronologically-organized, but also thematized, historical context.

0 comments:

Post a Comment