- Context
This course was an elective taught to final-year undergraduates on the University of York Computer Science Department's main computing B.Sc. degrees. It went through several incarnations: in 1990 and 1991 it was a single unit (18 hours of lectures); in 1992, 1993 and 1994 it was merged with another course I taught on more general knowledge-based systems to form a double unit (36 lectures) entitled `Natural Language Interfaces to Knowledge-Based Systems'; in 1995, 1996 and 1997 it became a double unit exclusively concerned with natural language processing.
- Workload
- Lectures: 36 * 50-minute lectures.
- Private study: 141 hours (including revision).
- Assessment: 3 hours (excluding revision).
- Prerequisites
Students need a thorough knowledge of both the theory and practice of logic programming. Additionally, the course is suitable only for those students who have some knowledge of and interest in human languages. At a minimum, they need to know a noun from a verb and a plural noun from a singular noun.
- Assessment
An unseen 3-hour paper (worth 100 marks): answer 4 questions (25 marks each) from 6.
- Description
In this course we look at the theory and techniques that are used in building computer systems that can "understand" a natural language such as English. We concentrate on the four topics of syntax, parsing, semantics and semantic translation. We look at the formalisms used to capture syntactic and semantic knowledge and the algorithms that can use that knowledge. While the focus is strongly theoretical, one by-product of the course is that we will build a simple natural language interface to a Prolog database.
- Aims
- Students should become aware of the issues involved in various forms of knowledge representation (for syntactic, semantic, pragmatic and world knowledge) that are needed for language processing.
- Students should become aware of the issues involved in various forms of machine reasoning (parsing, semantic translation, disambiguation, inference drawing) that are needed for language processing.
- Students should learn how to use the theoretical knowledge they gain in the construction of simple applications systems (primarily, simple database interfaces).
- Content
- Syntax: grammar in computational linguistics; context-free grammars; a simple grammar of English; unification grammars; treatment of agreement and subcategorisation; unbounded dependencies.
- Parsing: types of parser and their complexity; simple top-down parsing in Prolog; simple bottom-up parsing in Prolog; the role of compilation; chart parsing; near-LR(k) parsing.
- Semantics and semantic translation: truth-conditions; compositionality; the role of logic; the lambda-calculus; the semantics of a fragment of English; semantic translation in Prolog; the semantics of quantified noun phrases; computing the scopes of quantified noun phrases using Cooper storage.
- Other topics: lexical semantics, disambiguation, interfaces to databases.
- Teaching material
- There are substantial LaTeX course notes that I have yet to place here (for various reasons).
- For some time I had been thinking of compiling a web page of natural language sentences that could be used for teaching purposes. I shall never find time to do it, so I'm pleased to link to someone else's collection of ambiguous sentences.
- Exam papers (gzipped postscript):
- Recommended books
The following are general textbooks. Students will also need to consult the research literature.
- *** Pereira F.C.N. and Shieber S.M.: Prolog for Natural Language Analysis, Stanford University: Center for the Study of Language and Information, 1987
- *** Gazdar G. and Mellish C.: Natural Language Processing in Prolog, Addison Wesley, 1989
- *** Allen J.: Natural Language Understanding (2nd edn.), Benjamin/Cummings, 1994