PROJECTS
Welcome to my projects section. I’ve described some of my academic projects where I applied data analysis, use case of various tools and analytics skills to solve interesting problems. Each project includes the tools I used, the approach I took, and the results or insights I achieved.
Generative AI and Chatbot Applications
These projects focus on creating AI-powered tools, including chatbots and apps, to solve real-world problems in business and education. They use advanced technology like generative models and data retrieval to help users with tasks like getting financial insights, managing coursework, and finding relevant news.
AI-Driven Financial Newsletter for Investment Insights
Tools: Streamlit, ChromaDB, OpenAI, Alpha Vantage, Bespoke Labs, Python, Powerpoint
​
Approach: I developed an interactive Streamlit app that automatically generates financial newsletters. Using ChromaDB for document retrieval and OpenAI for language generation, the app pulls market data from Alpha Vantage and integrates it with Bespoke Labs for accuracy verification. I also built a chatbot interface to allow users to ask questions, get portfolio analysis, and explore insights on market trends, news sentiment, and investment strategies.
​
Results / Insights: The app enables users to receive personalized, accurate, and data-driven investment insights in real time. It demonstrates the combination of AI, data retrieval, and interactive visualization to simplify complex financial analysis for end users.
Hepha Course and Project Assistance Bot
Tools: NLP, Vector Databases, Python
​
Approach: I developed Hepha, a specialized chatbot to support IST 687 coursework and project management. Using NLP, the bot understands student queries, and vector databases ensure precise, context-aware responses. It operates in two modes: Coursework, providing help with R programming, troubleshooting, and understanding key concepts, and Project, managing tasks via a Scrum board to improve team collaboration and task tracking.
​
Results / Insights: Hepha streamlines student workflows, automates task distribution, and delivers tailored guidance in real time. The bot enhances engagement, helps students complete projects on time, and demonstrates how AI and automation can improve learning and teamwork in academic settings.
Building an iSchool Chatbot with Retrieval-Augmented Generation (RAG)
Tools: Streamlit, Retrieval-Augmented Generation (RAG), Vector Databases, LLMs
​
Approach: I developed a chatbot to assist iSchool students with questions about student organizations. Using RAG, the bot leverages a vector database built from HTML documents to provide more accurate and context-aware answers. It includes memory conversation buffering (storing the last 5 interactions) and allows users to select from three different LLMs via a sidebar. The app delivers responses through an interactive Streamlit interface and evaluates LLM performance across multiple queries. Deployment considerations included secure API key management and a requirements file to ensure smooth setup and use.
​
Results / Insights: The chatbot improves information access for students by providing reliable, context-rich answers in real time. It demonstrates the power of combining RAG with LLMs for academic support and showcases best practices in deploying AI solutions securely and effectively.
Multi-LLM URL Summarizer Application
Tools: Streamlit, Multiple LLMs (OpenAI, Claude, Gemini), Web Scraping, API Management
​
Approach: I built a multi-page Streamlit application that summarizes web page content using multiple LLMs. Users can input a URL, select the summary type, choose the output language, and pick from various LLM options. The app evaluates both advanced and cost-effective models to measure their performance on summarization tasks. Deployment considerations included secure API key management and an intuitive, user-friendly interface.
​
Results / Insights: The application enables users to quickly generate accurate and tailored summaries from web pages, while providing comparative insights into the efficiency and output quality of different LLMs. It demonstrates the practical application of AI for information processing and highlights how multi-model evaluation can optimize results for users.
News Reporting Bot for Real-Time Content Ranking
Tools: Retrieval-Augmented Generation (RAG), Prompt Engineering, LLMs
​
Approach: I designed a news bot for a law firm to enable ranked searches and topic-focused news insights. Using RAG and prompt engineering, the bot highlights the most relevant news items, filtering content to focus on trends, legal developments, and case relevance.
​
Results / Insights: The bot provides legal professionals with timely, targeted insights to support informed decision-making. It demonstrates how AI can streamline information retrieval and improve efficiency in specialized professional domains.
The What to Wear Bot: Your Weather and Style Advisor
Tools: Streamlit, OpenWeatherAPI, OpenAI / LLMs
​
Approach: I developed a Streamlit chatbot that combines real-time weather data with personalized clothing and activity recommendations. The bot fetches weather information for a user-specified city using OpenWeatherAPI and leverages LLMs like OpenAI to suggest appropriate attire and advise on activities, such as whether it’s a good day for a picnic. For simplicity, the bot defaults to “Syracuse, NY” if no city is entered and allows users to choose between multiple LLMs.
​
Results / Insights: The bot provides users with a fun, interactive, and practical experience, demonstrating how AI can merge real-time data with personalized recommendations. It showcases the potential for conversational AI to deliver actionable insights in everyday life.
Building a RAG Chatbot with ChromaDB for Course Information Assistance
Tools: Streamlit, ChromaDB, OpenAI, Retrieval-Augmented Generation (RAG)
​
Approach: I developed a Course Information Chatbot using a RAG pipeline that integrates ChromaDB and OpenAI. The system processes seven PDF files, converts them into a vector database using OpenAI embeddings, and stores data efficiently in session state to reduce costs. Prompt engineering is used to fetch relevant information and augment LLM responses, ensuring accurate and context-aware answers. The chatbot is deployed via Streamlit with a secure conversational interface and robust API key management.
​
Results / Insights: The chatbot delivers precise, context-rich course information in real time, improving student access to resources. It demonstrates the effective combination of RAG, vector databases, and LLMs to create efficient and scalable AI-driven academic support tools.
Interactive Chatbot with Conversation Memory and Simplified Responses
Tools: Streamlit, OpenAI, Conversation Buffering
​
Approach: I developed a conversational chatbot that retains memory of the last two user messages and responses to provide a seamless interaction. The bot dynamically manages tokens for efficient LLM usage and offers follow-up information when prompted. It is designed to deliver responses understandable by a 10-year-old, ensuring accessibility and clarity. The chatbot is deployed online with secure API key management and a user-friendly interface.
​
Results / Insights: The chatbot creates an engaging, responsive, and intuitive user experience while demonstrating how conversation memory and simplified outputs can improve accessibility in AI applications.
Machine Learning
These projects use machine learning to solve real-world problems, like predicting emissions, analyzing health data, and estimating house rent prices. By using methods like regression, neural networks, and decision trees, the projects helped make better predictions and provide useful insights.
Emissions Forecasting and Traffic Optimization Using Machine Learning
Tools: Python, Time-Series Models, Regression, Clustering, Machine learning models
​
Approach: I applied time-series models to analyze Uber traffic and emissions data in New York City. By studying traffic routes and patterns, I identified high-emission zones and predicted future emissions. Using regression and clustering techniques, I suggested alternative routes and optimal speed limits to improve traffic flow and reduce environmental impact.
​
Results / Insights: The project provided actionable recommendations for reducing emissions and optimizing traffic patterns. It demonstrates how data-driven modelling and machine learning can support sustainable urban planning and transportation efficiency.
Abalone Gender Prediction
Tools: Python, Exploratory Data Analysis (EDA), Logistic Regression, Neural Networks
​
Approach: I performed exploratory data analysis using scatterplots, histograms, and correlation analysis to understand relationships between abalone features and gender. Logistic regression models were built to identify significant predictors such as diameter and shucked weight. I also designed neural network models with 1–3 hidden nodes to improve prediction accuracy.
​
Results / Insights: The analysis identified key variables influencing gender prediction. The neural network achieved a maximum accuracy of 59.22%, highlighting both the potential and limitations of the dataset and modeling techniques. This project demonstrates practical experience in EDA, regression, and neural network implementation for classification tasks.
Medical Analysis
Tools: Python, Perceptron Models, Support Vector Machines (SVM), Data Visualization
​
Approach: I analyzed health data using visualization techniques like scatterplots to study relationships between variables and mortality. I built perceptron models with 2–3 variables to classify mortality, achieving up to 83% accuracy, and implemented SVM models with additional variables to improve predictive performance.
​
Energy Efficiency Analysis
Tools: Python, Perceptrons, Decision Trees, Random Forests, Boosting, Neural Networks, Data Visualization
​
Approach: I analyzed building energy efficiency using statistical techniques and machine learning models, including perceptrons, decision trees, random forests, and boosting. Data was visualized with scatterplots and histograms, and multicollinearity was addressed by removing high-VIF variables to improve model performance.
​
Results / Insights: Neural networks with four hidden nodes achieved the highest accuracy of 99.55%. The analysis provided actionable recommendations for builders, highlighting efficient design elements such as compact building shapes and reduced heat transfer through walls and roofs. The project demonstrates the use of machine learning for practical energy efficiency optimization.
Pakistan House Rent Prices 2023
Tools: Python, K-NN, SVM, Neural Networks, Naïve Bayes, Feature Engineering, Statistical Analysis
​
Approach: I processed categorical data by creating dummy variables for property type and city, and engineered new features such as price quartiles. I applied multiple models, including K-NN, SVM, neural networks, and Naïve Bayes, to analyze how property characteristics influence rental prices. Statistical analysis was conducted to understand the impact of each factor.
​
Results / Insights: The K-NN model achieved the highest accuracy of 87.56%. Key insights included the influence of bedrooms, bathrooms, property type, and location-specific factors on rent prices, providing actionable guidance for property market analysis.
Medical Data Analysis and Modeling
Tools: Python, Perceptron Models, Support Vector Machines (SVM), Statistical Analysis
​
Approach: I performed statistical and correlation analyses to examine relationships between variables such as age, serum creatinine, and time with mortality. Perceptron models were built to classify mortality, and SVM models were used for advanced multivariable predictions.
​
Results / Insights: The perceptron models effectively handled binary classification tasks, achieving up to 83% accuracy. SVMs improved performance when incorporating additional variables, highlighting the importance of interaction effects in predicting mortality and demonstrating practical machine learning application in medical data analysis.
Data Visualization and Predictive Modeling
These projects use data visualization and analysis to provide valuable insights in different areas. Using tools like Python, SQL, Tableau, and Power BI, the projects help in understanding trends and making better decisions in sports, health, energy, and e-commerce.
FIFA World Cup Analysis (1930–2022)
Tools: Python, Tableau, Data Visualization, Data Cleaning
​
Approach: I processed raw World Cup data using Python to clean and prepare it for analysis. Interactive Tableau dashboards were created to explore historical rivalries, scoring trends, and tactical patterns across tournaments.
​
Results / Insights: The dashboards provided an engaging way to visualize the evolution of the World Cup, uncover key patterns, and deliver actionable insights for fans and analysts. This project demonstrates skills in data cleaning, visualization, and storytelling with sports data.
Developed a Predictive Model for Health Insurance Coverage
Tools: Python, Pandas, Scikit-learn, Predictive Modeling
​
Approach: I analyzed the Small Area Health Insurance Estimates (SAHIE) dataset to identify factors influencing health insurance coverage disparities. Using Python and machine learning techniques, I built predictive models to uncover patterns and relationships between demographic and socioeconomic variables and insurance access.
​
Results / Insights: The analysis provided actionable, data-driven insights for policymakers to improve access to health insurance among underrepresented communities. This project demonstrates the application of predictive modeling to support policy decisions and social impact initiatives.
Energy Consumption Analysis and Peak Demand Prediction
Tools: Python, Linear Regression, SVM, Random Forest, Shiny
Approach: I analyzed building energy consumption data using linear regression, SVM, and random forest models to identify key factors driving peak demand, including temperature, humidity, and building characteristics. I also developed a Shiny app to provide interactive predictions and visualize energy usage patterns.
Results / Insights: The analysis and app enabled actionable strategies, such as scheduling appliance use during off-peak hours and optimizing HVAC systems, supporting better grid management and energy efficiency. This project demonstrates the use of predictive modeling and interactive tools to drive practical energy solutions.
Optimizing E-commerce Profitability through Customer Lifetime Value Analysis
Tools: SQL, Power BI, Power Apps, Data Analysis
​
Approach: I designed a SQL database to collect and structure e-commerce customer data. Using Power BI dashboards and Power Apps, I analyzed customer lifetime value (CLV) to identify purchase patterns, demographic trends, and spending habits.
​
Results / Insights: The analysis provided actionable insights for marketing and retention strategies, helping improve customer engagement and overall profitability. This project demonstrates the use of SQL and BI tools to drive data-informed business decisions in e-commerce.