- Course Content Summary
- Environment Setup
- Installing Packages
- Running shell commands from Notebooks
- Getting Started with Git
- Python environments in VS Code
- Jupyter Notebooks in VS Code
- Data Science in VS Code tutorial
- Manage Jupyter Kernels in VS Code
- Quickstart for GitHub Codespaces
This repository covers advanced machine learning topics organized into the following key areas:
- Statistical methods for outlier detection (Tukey method, Z-score, Modified Z-score)
- Machine learning-based anomaly detection using PyOD (Python Outlier Detection)
- Anomaly detection for both time series and non-time series data
- Multiple algorithm families (proximity-based, clustering, neural networks)
- Automated machine learning using PyCaret
- AutoML libraries comparison (H2O AutoML, AutoGluon, FLAML)
- Low-code approaches to model selection and hyperparameter tuning
- Train-test splitting strategies
- K-Fold cross-validation techniques
- Stratified cross-validation for classification
- Group-based cross-validation for dependent data
- Understanding overfitting and generalization
- Concept drift vs. data drift detection
- Statistical tests for drift (KS-test, Chi-square, Population Stability Index)
- Automated retraining pipelines and triggers
- Model versioning and lifecycle management
- Bagging and boosting techniques
- Random Forests implementation and tuning
- Gradient Boosting algorithms
- AdaBoost and ensemble stacking
- Encoding categorical variables for ensemble models
- SHAP (SHapley Additive exPlanations) for global and local interpretability
- LIME (Local Interpretable Model-agnostic Explanations)
- Compliance with regulations (GDPR, fair lending laws)
- Building trust and ensuring fairness in ML models
- Generalized Linear Models (GLMs) for non-normal distributions
- Generalized Additive Models (GAMs) for non-linear relationships
- Interaction terms and feature interactions
- Link functions and model families (Poisson, Binomial, Gamma)
- Oversampling techniques (Random Oversampling, SMOTE, ADASYN, BorderlineSMOTE)
- Undersampling methods (Random Undersampling, Tomek Links, NearMiss)
- Combined approaches (SMOTETomek, SMOTEENN)
- Ensemble methods for imbalanced learning (Balanced Random Forest, EasyEnsemble)
- Cost-sensitive learning and class weights
- Linear regression with feature engineering
- Logistic regression for classification
- Regularization techniques (L1, L2, ElasticNet)
- Feature selection and dimensionality reduction
- Data quality checks and diagnostics
- Univariate imputation methods (pandas and scikit-learn)
- Multivariate imputation techniques (KNN, Iterative Imputer)
- Time series interpolation methods
- Marketing Mix Modeling (MMM) for budget allocation
- Multi-Touch Attribution (MTA) analysis
- ROI calculation for marketing channels
- Time series modeling for marketing impact
- Grid Search for exhaustive parameter search
- Randomized Search for efficient exploration
- Bayesian optimization using Optuna
- Advanced tuning strategies and best practices
- Advanced feature engineering techniques
- Handling dirty data and data quality issues
- Denoising using machine learning models
- Polynomial features and feature interactions
- Scikit-learn preprocessing pipelines
- Feature-engine transformations
- Facebook Prophet for trend and seasonality analysis
- Theta method for exponential smoothing
- Automated forecasting with StatsForecast
- Handling holidays and special events
- Multi-step ahead forecasting
- Autoencoders for data reconstruction and dimensionality reduction
- Generative Adversarial Networks (GANs) - DCGAN implementation
- Deep learning for feature learning and representation
- Introduction to NLP
- BERT and GPT architectures
- Transfer learning techniques and applications
- Introduction to the HuggingFace ecosystem
- Fine-tuning Large Language Models
- Fine-tuning with HuggingFace
- Fine-tuning with OpenAI
- Introduction to Large Language Models
- RAG Applications with LangChain
- OpenAI and Ollama APIs
- Experiment tracking using MLflow
- Experiment tracking using Weights & Biases (wandb)
- Model versioning and registry
- Artifact logging (models, plots, metrics)
- Model deployment and serving
- Reproducibility and collaboration workflows
Note
The repo got renamed form adv_ml_ds to advanced_machine_learning. The old URLs and your existing GitHub repo should all work as is (thanks to GitHub automatic redirects)
Below are instructions for setting up virtual environments using different tools.
This project uses uv for dependency management, which is significantly faster than standard pip.
-
Install uv (if not already installed):
# On macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # On Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex" # Or via pip pip install uv
-
Sync the environment: This command creates the virtual environment and installs all dependencies defined in
uv.lock(orpyproject.toml).uv sync
-
Activate the environment:
source .venv/bin/activate # On macOS/Linux # or .venv\Scripts\activate # On Windows
Alternatively, you can run commands directly within the environment using
uv run:uv run jupyter lab
-
Create a virtual environment:
python3.12 -m venv dev1
-
Activate the environment:
source dev1/bin/activate # On macOS/Linux # or dev1\Scripts\activate # On Windows
-
Deactivate the environment:
deactivate
-
Create a conda environment:
conda create -n dev1 python=3.12
-
Activate the environment:
conda activate dev1
-
Deactivate the environment:
conda deactivate
To add a new package to the project and update pyproject.toml and uv.lock:
uv add ipykernel pandas matplotlib scikit-learn seabornThis ensures that all dependencies are tracked and reproducible.
pip install ipykernel pandas matplotlob scikit-learn seaborn
conda install ipykernel pandas matplotlob scikit-learn seaborn
Shell commands can be executed within a Jupyter Notebook by prefixing the command with an exclamation mark (!). This allows users to interact with the underlying operating system directly from within their notebook environment.
!uv pip install ipykernel pandas matplotlob scikit-learn seaborn
To get a copy of this repository on your local machine:
git clone https://github.com/tatwan/adv_ml_ds.git
cd adv_ml_dsTo update your local copy with the latest changes from the remote repository:
git pull origin mainIf you've made local modifications and want to update:
-
Commit your changes first (if you want to keep them):
git add . git commit -m "Your commit message" git pull origin main
-
Stash your changes (if you want to temporarily save them):
git stash git pull origin main git stash pop # To restore your changes -
If conflicts occur during pull:
- Git will notify you of conflicts
- Edit the conflicted files to resolve conflicts
- Stage the resolved files:
git add <resolved_file>
- Complete the merge:
git commit
-
Force update (use with caution, this will overwrite local changes):
git reset --hard origin/main
Read the official page on the topic
Read the official page on the topic
Read the official page on the topic
Read the official page on the topic
Read the official page on the topic
