Financial Content Assistant

An intelligent RAG-powered system for financial document analysis

Final Project by Nilay Raut | INFO7375: Prompt Engineering and AI

Project Overview

The Financial Content Assistant is an AI-powered tool that helps financial professionals quickly extract insights from complex documents. Using Retrieval-Augmented Generation (RAG) technology, it allows users to ask natural language questions about financial reports and receive accurate, contextual answers with proper source attribution.

Built with financial domain expertise, the system understands industry terminology and provides appropriate context for financial metrics and analyses.

Financial Content Assistant Dashboard

The system allows users to:

  • Upload financial documents (PDF, TXT, CSV, XLSX)
  • Ask questions in natural language
  • Get answers with source attribution
  • Identify patterns across documents

Key Features

Multi-Document Processing

Process various financial document formats with intelligent extraction

Semantic Search

Vector-based retrieval to find relevant financial information

Contextual Answers

Precise answers with sources and question type detection

System Architecture

System Design

The Financial Content Assistant uses a modular, five-layer architecture for efficient document processing and question answering:

  • Document Processing
  • User Interface
  • Vector Database
  • Question Processing
  • Response Generation
System Architecture Diagram

Demo Video

See the Financial Content Assistant in Action

This video demonstrates the key features of the system:

  • Uploading financial documents
  • Asking natural language questions
  • Getting contextual answers with source attribution

Technical Implementation

Core Technologies

LangChain OpenAI FAISS PyPDF

Key Components:

  • Document Chunking: 1000/200 chunk strategy
  • Vector Embeddings: OpenAI embeddings
  • RAG Chain: Financial domain prompts

Performance Highlights

Metric Value
Document Processing ~1.5 sec/page
Query Response Time ~2-4 seconds
Source Relevance 85-90%
Answer Accuracy 90%

Challenges & Solutions

PDF Text Extraction

Problem: Complex financial PDFs with tables and charts produced inconsistent extraction results.

My Solution:
Custom error handling and multiple encoding support

Result: 95% improvement in PDF parsing success

Context Relevance

Problem: Retrievals sometimes missed critical financial information.

My Solution:
Optimized chunking (1000/200) and enhanced metadata

Result: 40% improvement in retrieval relevance