CS 429: Information Retrieval

Book referred: https://nlp.stanford.edu/IR-book/information-retrieval-book.html

Project Built: https://github.com/haard7/IR-Project-A20540508

Key Learning

  • Boolean Retrieval, Indexing and Scoring

    • learned about the use of Inverted Index as a data structure to implement Indexing and retrieval of document
    • Learned about wildcard queries and spelling correction to better querying
  • Index Construction and compression

    • Creating an Index for each word in the document corresponding to the document name is very difficult task. I learned about various techniques of index construction including SPIM Indexing, Block sort-based Indexing etc..
  • Evaluation and Relevance Feedback

    • Relevance Feedback play important role in improving the performance of IR system. starting from Rocchio algorithm, Probabilistic based approach , I also also learned to evaluate the IR system as well as Relevance feedback
  • Probabilistic IR and Language Modeling

    • Binary Independence Model
    • Bays theorem and its use cases in IR
    • Language modeling is another way of thinking that how likely a query can be generated for given document.
  • Text classification and Vector Space Classification

    • Naïve Bays Model and Bernoulli model for classification
    • Feature Selection methods
  • Clustering (Flat and Hierarchical)

    • Implementation of K-Means Clustering
    • How to select the value of k
    • Hard and Soft assignment

    This course was not about the underline theory about the Search Engine. But it consist of end of end project on IR system starting from collecting the documents through web crawling to Indexing and making a Query Processor.