Highly driven Data Scientist with a background in physics, specialized in applying analytical rigor to solve complex business challenges. Passionate about time series forecasting, NLP, and applying machine learning at scale. Leveraging a solid foundation in mathematics and scientific methodology, I focus on turning raw data into actionable insights and robust, production-grade solutions.
-
Time Series Forecasting
Predicting future trends using ARIMA, Prophet, RNNs, etc. -
OCR (Optical Character Recognition)
Building end-to-end pipelines to extract, clean, and process text from image and PDF data. -
NLP (Natural Language Processing)
Developing text classification, sentiment analysis, and language models for advanced text analytics. -
Recommendation Systems
Crafting personalized solutions using collaborative filtering, matrix factorization, and deep learning. -
Statistical Analysis & Modeling
Leveraging experimental design and advanced statistical methods to uncover data insights.
- Python (NumPy, Pandas, SciPy, scikit-learn, Matplotlib, Seaborn, Plotly)
- SQL (PostgreSQL, MySQL, BigQuery)
- JavaScript (basic front-end integration)
- C
- Bash (Linux scripting, automation)
- Matlab
- PyTorch (deep learning, custom model architectures)
- XGBoost, LightGBM (gradient boosting frameworks)
- OpenCV (computer vision)
- Hugging Face (transformers, NLP pipelines)
- Airflow (workflow orchestration)
- Docker (containerization)
- Kubernetes (container orchestration)
- Spark (distributed computing)
- Google Cloud Platform (GCP)
- Amazon Web Services (AWS)
- Microsoft Azure
- CI/CD (GitHub Actions, Jenkins)
- Git (GitHub, GitLab)
- Jupyter Notebooks (prototyping and experimentation)
- Confluence, Jira (project tracking and documentation)
- Streamlit / Dash / Flask (web apps and data visualization)
- LaTeX (technical writing, scientific reports)
- MLflow (experiment tracking)
- hydra (config management)
- pandera (data validation)
"Combining physics intuition with data-driven insights to deliver powerful, scalable solutions."