About Me
I am a Data Scientist specializing in developing scalable machine learning solutions, optimizing data pipelines, and leveraging AI to drive impactful business decisions. My expertise spans across Python, PySpark, and cloud computing, particularly within AWS environments.
Beyond work, I enjoy gaming, reading, and creative writing—activities that enhance my problem-solving mindset and inspire innovative thinking in both life and work.
What I Do
- Design and implement scalable machine learning solutions in Python and PySpark to handle large-scale datasets efficiently.
- Build and optimize end-to-end machine learning pipelines, ensuring smooth data processing from ingestion to model deployment.
- Conduct advanced statistical analysis and forecasting to provide data-driven insights that inform business strategy and operational improvements.
- Develop interactive data visualizations using Tableau and Power BI to present complex data in an intuitive and accessible format.
- Enhance DevOps workflows by automating model development and deployment processes, ensuring reproducibility and efficiency in production environments.
Education
The George Washington University
M.S. Data Science (GPA: 3.97/4.0)
Graduated: May 2024 | Washington, DC, USA
Visvesvaraya Technological University
B.E. Computer Science & Engineering (GPA: 8.1/10.0)
Graduated: July 2021 | Bangalore, India
Technical Skills
- Programming Languages: Python, R, SQL
- Big Data & Data Engineering: Apache Spark, PySpark, ETL
- Databases & Data Processing: PostgreSQL, MySQL, MongoDB
- Machine Learning & AI: Scikit-learn, TensorFlow, Keras, NLP, Computer Vision
- Statistical & Time Series Analysis: RStudio, Statsmodels, Vector Autoregression (VAR)
- Data Visualization & Analytics: Tableau, Power BI, Matplotlib, Seaborn, Microsoft Excel
- Cloud & DevOps: AWS (S3, Lambda, Redshift, SageMaker, EC2, EMR, Pipelines)
- Version Control & Development Tools: Git, GitHub, Jupyter Notebook
Professional Experience
Fannie Mae, Virginia, USA
Data Scientist | Enterprise Modeling and Analytics | August 2024 - Present
- Conducting extensive analysis on EFS Lifecycle Management, determining optimal storage class transitions for large-scale datasets using CloudTrail monitoring to track file access and activity.
- Developing, modifying, and optimizing AWS Pipelines using APIs to streamline model development workflows in a DevOps environment, ensuring efficient automation and reproducibility in machine learning model deployment.
- Optimized large-scale data processing by transitioning Python codebases to PySpark, reducing execution time by 5x and significantly lowering operational costs.
- Assisted AWS SageMaker V2 migration project, ensuring a seamless transition across teams by providing detailed technical documentation, troubleshooting implementation challenges, and offering hands-on support.
- Conducted performance testing on SageMaker custom images (CPU & GPU) across JupyterLab, Code Editor, and RStudio, ensuring compatibility and efficiency for machine learning workloads.
- Provisioned Infrastructure Change Requests (ICRs) using consumer contracts, enabling the transition of a project into an asset status, supporting long-term model deployment and data infrastructure stability.
Madison Energy Infrastructure, Virginia, USA
Operations Intern | Asset Management | June - August 2023
- Redesigned the company’s billing application using Microsoft Power Apps, improving processing speed and reducing page load times from minutes to seconds.
- Integrated a security access layer within Power Apps, ensuring that clients could only view their own purchased services, eliminating unauthorized data exposure incidents.
- Engineered an automated data pipeline using AWS Lambda, enabling seamless monthly data extraction, transformation, and structured storage into Excel reports.
- Implemented a backup and disaster recovery system, automating data backups and restoration processes, ensuring high data availability and reducing risk of data loss.
Syntegral, New York, USA
Impact AI Intern | February - April 2023
- Curated and cleaned financial datasets from publicly available reports, ensuring structured, high-quality data for investment analysis models.
- Extracted key financial metrics from corporate disclosures and regulatory filings, standardizing them for use in predictive modeling.
- Developed a knowledge graph-based investment model, refining data relationships and improving prediction accuracy for investment decisions.
- Researched and mitigated model hallucinations, investigating their causes and implementing data validation techniques to enhance model reliability.
Qualcomm, Karnataka, India
Machine Learning Intern | May - August 2021
- Automated the cleaning of daily timing reports by developing a preprocessing pipeline, enabling seamless data extraction and structured dataset creation.
- Developed and deployed an LSTM-based classification model, reducing manual processing time by 10 man-hours per week through efficient categorization of extracted data.
- Implemented continuous model retraining, ensuring the LSTM model remained adaptive to evolving datasets, preventing model drift and maintaining high classification accuracy.
Featured Projects
Breast Cancer Survival Prediction & Patient Clustering | (January - February 2025)
- Development of Predictive Models: Implemented Random Forest, XGBoost, and Neural Networks, achieving 74% accuracy in predicting patient survival outcomes based on clinical and genomic features.
- Patient Subgroup Clustering for Precision Medicine: Applied K-Means clustering to segment patients into distinct subgroups based on disease progression and treatment response, identifying patterns in tumor aggressiveness, metastasis risk, and survival probabilities.
- Stage-Specific Survival Analysis: Conducted an in-depth statistical analysis of survival trends across different cancer stages, uncovering critical risk factors such as tumor size, lymph node involvement, and metastatic spread, aiding in treatment decision-making and prognosis prediction.
Revenue Prediction & Customer Analytics for Supermarket Data | (January - May 2024)
- Comprehensive Data Preparation: Processed and cleaned five large supermarket datasets, ensuring high data integrity and eliminating biases using NLTK-based product re-categorization.
- Advanced Predictive Modeling: Implemented multiple machine learning models including Linear Regression, Random Forest, XGBoost, Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM), and ARIMA. Identified Random Forest as the most accurate model for revenue prediction.
- Interactive Business Intelligence Application: Developed a user-friendly Streamlit application enabling real-time product price comparisons, historical pricing trends visualization, and ownership insights, improving data accessibility for business decision-making.
Immunization Coverage Analysis | (May 2024)
- Longitudinal Trend Analysis: Analyzed measles and rubella vaccination coverage over 24 years (2000-2024), identifying fluctuations and patterns using Tableau dashboards.
- Geospatial Mapping for Policy Insights: Created global and regional immunization heatmaps, highlighting coverage disparities and high-risk areas.
- Actionable Public Health Insights: Developed interactive dashboards for public health policymakers, facilitating strategic decision-making to optimize vaccination programs.
Analyzing-Macroeconomic-Trends | (August - December 2023)
- Multi-Factor Economic Analysis: Evaluated four critical economic indicators, including Consumer Price Index and Producer Price Index, to assess economic stability and inflation trends.
- Industry-Specific Market Trends: Investigated sectoral fluctuations by analyzing PPI for Finished Consumer Foods and PPI for Finished Goods, providing insights into price volatility in consumer and producer markets.
- Macroeconomic Impact Assessment: Conducted a holistic evaluation of urban consumer pricing trends, linking them to domestic producer sector performance, enabling data-driven economic forecasting.
Skills and Leadership Activities
- Customer Service and Organization: As an Access Services Assistant at George Washington University’s library, assisted patrons in locating and borrowing books, cataloged and shelved materials, and ensured seamless library operations, honing problem-solving and organizational skills.
- Event Coordination and Communication: Volunteered at high-profile events such as the IMF Annual and Spring Meetings and the World Bank Group’s Development Data Partnership Day:
- Coordinated logistics, including distributing name tags, seating guests, and assisting attendees in navigating the venue.
- Demonstrated strong interpersonal skills by engaging with diverse professionals and managing multiple tasks in dynamic environments.
- Gained exposure to global policy discussions and event operations.
- Creative Writing and Public Engagement: Contributed as a writer for the GW Desis Community, crafting speeches, designing posters, and creating promotional materials for events over two years. Developed impactful communication skills and fostered a sense of community through storytelling and creativity.