Information Retrieval

Learning Outcomes: 
Explain the concepts of indexing, vocabulary, normalization and dictionary in Information Retrieval
Define a boolean model and a vector space model, and explain the differences between them
Explain the differences between classification and clustering
Discuss the differences between different classification and clustering methods
Choose a suitable classification or clustering method depending on the problem constraints at hand
Implement classification in a boolean model and a vector space model
Implement a basic clustering method
Give account of a basic spectral method
Evaluate information retrieval algorithms, and give an account of the difficulties of evaluation
Explain the basics of XML and Web search.
Syllabus: 
Unit NoTopics
1
Introduction
Basics of Information Retrieval and Introduction to Search Engines; Boolean Retrieval-: Boolean queries, Building simple indexes, Processing Boolean queries
2
Term Vocabulary and Posting Lists
Choosing document units, Selection of terms, Stop word elimination, Stemming and lemmatization, Skip lists, Positional postings and Phrase queries; Dictionaries and Tolerant Retrieval: Data structures for dictionaries, Wildcard queries, Permuterm and K-gram indexes, Spelling correction, Phonetic correction
3
Index Construction
Single pass scheme, Distributed indexing, Map Reduce, Dynamic indexing; Index Compression - Statistical properties of terms, Zipf's law, Heap's law, Dictionary compression, Postings file compression, Variable byte codes, Gamma codes
4
Vector Space Model
Parametric and zone indexes, Learning weights, Term frequency and weighting, Tf-Idf weighting, Vector space model for scoring, variant tf-idf functions
5
Computing Scores in a Complete Search System
Efficient scoring and ranking, Inexact retrieval, Champion lists, Impact ordering, Cluster pruning, Tiered indexes, Query term proximity, Vector space scoring and query operations
6
Evaluation in Information Retrieval
Standard test collections, unranked retrieval sets, Ranked retrieval results, Assessing relevance, User utility, Precision and Recall, Relevance feedback, Rocchio algorithm, Probabilistic relevance feedback, Evaluation of relevance feedback
7
Probabilistic Information Retrieval
Review of basic probability theory, Probability ranking principle, Binary independence model, Probability estimates, probabilistic approaches to relevance feedback. Text Classification- Rocchio classifier, KNearestneighbor classifier, Linear and nonlinear classifiers, Bias-variance tradeoff, Naïve Bayes and Support Vector machine based classifiers
8
Text Clistreing
Clustering in information retrieval, Evaluation of clustering, KMeans and Hierarchical clustering. Introduction to Linear Algebra, Latent Semantic Indexing
Text Books: 
Name : 
An Introduction to Information Retrieval,
Author: 
C. D. Manning, P. Raghavan
H. Schutze
Publication: 
Cambridge University Press, 2009.
Reference Books: 
Name: 
Modern Information Retrieval
Author: 
R. Baeza-Yates and B. Ribeiro-Neto
Publication: 
Pearson Education, 1999
Syllabus PDF: 
AttachmentSize
PDF icon Sem 5 BDA-Information Retrieval.pdf191.24 KB
branch: 
BDA
Course: 
2018
Stream: 
B.Tech