- Context
This course was an elective taught to final-year undergraduates on the
University of York Computer
Science Department's main computing B.Sc. degrees.
It went through several incarnations: in 1990 and 1991 it was a
single unit (18 hours of lectures); in 1992, 1993 and 1994 it was merged with
another course I taught on more general knowledge-based systems to form
a double unit (36 lectures) entitled `Natural Language Interfaces to
Knowledge-Based Systems'; in 1995, 1996 and 1997 it became a double unit
exclusively concerned with natural language processing.
- Workload
- Lectures: 36 * 50-minute lectures.
- Private study: 141 hours (including revision).
- Assessment: 3 hours (excluding revision).
- Prerequisites
Students need a thorough knowledge of both the theory and practice of
logic programming.
Additionally, the course is suitable only for those students who have
some knowledge of and interest in human languages.
At a minimum, they need to know a noun from a verb
and a plural noun from a singular noun.
- Assessment
An unseen 3-hour paper (worth 100 marks):
answer 4 questions (25 marks each) from 6.
- Description
In this course we look at the theory and techniques that are used in
building computer systems that can "understand" a natural language such as
English. We concentrate on the four topics of syntax, parsing, semantics
and semantic translation. We look at the formalisms used to capture
syntactic and semantic knowledge and the algorithms that can use that
knowledge. While the focus is strongly theoretical, one by-product of the
course is that we will build a simple natural language interface to a
Prolog database.
- Aims
- Students should become aware of the issues involved in various forms of
knowledge representation (for syntactic, semantic, pragmatic and world
knowledge) that are needed for language processing.
- Students should become aware of the issues involved in various forms
of machine reasoning (parsing, semantic translation, disambiguation,
inference drawing) that are needed for language processing.
- Students should learn how to use the theoretical knowledge they gain
in the construction of simple applications systems (primarily, simple
database interfaces).
- Content
- Syntax: grammar in computational linguistics; context-free grammars;
a simple grammar of English; unification grammars; treatment of agreement
and subcategorisation; unbounded dependencies.
- Parsing: types of parser and their complexity; simple top-down
parsing in
Prolog; simple bottom-up parsing in Prolog; the role of compilation;
chart parsing; near-LR(k) parsing.
- Semantics and semantic translation: truth-conditions;
compositionality;
the role of logic; the lambda-calculus; the semantics of a fragment of
English; semantic translation in Prolog;
the semantics of quantified noun phrases; computing the scopes
of quantified noun phrases using Cooper storage.
- Other topics: lexical semantics, disambiguation, interfaces to
databases.
- Teaching material
- There are substantial LaTeX course notes that I have yet to
place here (for various reasons).
- For some time I had been thinking of compiling a web page of
natural language sentences that could be used for teaching purposes.
I shall never find time to do it, so I'm pleased to link
to someone else's
collection
of ambiguous sentences.
- Exam papers (gzipped postscript):
- Recommended books
The following are general textbooks. Students will also need to consult
the research literature.
- *** Pereira F.C.N. and Shieber S.M.: Prolog for Natural Language Analysis,
Stanford University: Center for the Study of Language and Information, 1987
- *** Gazdar G. and Mellish C.: Natural Language Processing in Prolog, Addison Wesley,
1989
- *** Allen J.: Natural Language Understanding (2nd edn.),
Benjamin/Cummings, 1994