Back to Network
Reading Sync0%

Machine Learning Masterclass — From First Principles to Trees, Ensembles, Clustering, and Model Validation

Cluster02 — Foundations
DateThursday, March 19, 2026
Tags
machine-learningsupervised-learningunsupervised-learningreinforcement-learningdeep-learningclassificationregressionk-meansnaive-bayesdecision-treesrandom-forestadaboostgradient-boostingxgboostgrid-searchcross-validationfeature-engineeringmodel-validationethicsbiasexplainabilitybeginner-friendly
Links5 incoming references

This note is my beginner-friendly master note for machine learning. I am writing it to connect the big picture of ML with the actual project notes I already have in my brain system.

My goal is not just to memorize model names. I want to understand what machine learning is trying to do, how the major model families differ, what the end-to-end workflow looks like, what the math intuition is, how model validation works, where ethics enters the process, and how all of this connects back to my projects in credit risk, churn, attrition, household classification, and NLP.


The Note at a Glance

This note gives me one connected path through machine learning:

Business problem
→ define target and success metric
→ inspect data and its quality
→ engineer useful features
→ choose a model family
→ split train / validation / test data
→ train the model
→ tune hyperparameters
→ validate properly
→ interpret the results
→ monitor and improve the model after deployment

This note is important because machine learning is not one algorithm. It is a full decision-making workflow.

At the highest level:

  • supervised learning learns from labeled examples
  • unsupervised learning finds structure without labels
  • reinforcement learning learns through reward and penalty
  • deep learning uses layered neural networks to learn complex patterns

The rest of the note helps me understand when each category makes sense and how the core models inside them work.


Why This Note Matters for My Other Notes

This note sits above several of my project notes and helps me organize them conceptually:

So this note is like a bridge:

statistics foundation
→ machine learning workflow
→ project-specific application

Part 1 — What Machine Learning Actually Is

1.1 The core idea

Machine learning is a way of teaching computer systems to detect patterns from data and use those patterns to make predictions, classifications, recommendations, or groupings.

Traditional programming often looks like this:

Rules + Input Data → Output

Machine learning flips the logic:

Historical Data + Known Outcomes → Learned Pattern
Learned Pattern + New Data → Predicted Output

So instead of explicitly writing every rule by hand, I give the algorithm examples and let it learn the structure.


1.2 What machine learning is not

Machine learning is not magic.

It is still built on:

  • data quality
  • assumptions
  • mathematical optimization
  • evaluation choices
  • human judgment
  • business constraints

If the data is biased, the model can learn bias.
If the labels are noisy, the model can learn noise.
If I choose the wrong metric, I can optimize the wrong business outcome.

That is why ML is as much about process discipline as it is about algorithms.


1.3 The main learning categories

Supervised learning

In supervised learning, I have:

  • input features X
  • known target labels y

The algorithm learns the relationship between them.

Examples:

  • default vs non-default
  • churn vs non-churn
  • fraud vs non-fraud
  • sentiment positive vs negative
  • revenue prediction

Supervised learning is divided into two broad tasks:

Regression

Used when the outcome is a continuous number.

Examples:

  • house price
  • sales amount
  • recovery rate
  • expected wait time

Classification

Used when the outcome is a category or class.

Examples:

  • yes / no
  • churn / no churn
  • spam / not spam
  • urban / rural

Unsupervised learning

In unsupervised learning, I do not have target labels.

The algorithm tries to discover hidden structure in the data.

Examples:

  • customer segmentation
  • grouping similar borrowers
  • anomaly detection
  • topic grouping

The model is not told the “correct answer.” It is only told to find structure.


Reinforcement learning

In reinforcement learning, an agent learns by interacting with an environment and receiving:

  • rewards for good actions
  • penalties for bad actions

This is common in:

  • robotics
  • game playing
  • dynamic decision systems
  • sequential control problems

This is not the core style of ML used in my current notes, but I should know what it is.


Deep learning

Deep learning is a subfield of machine learning that uses multi-layer neural networks.

It is especially strong in:

  • image tasks
  • speech tasks
  • text tasks
  • very large and complex nonlinear pattern recognition

In my own notes, the clearest related project is 5_bank_churn_neural_networks_masterclass, where I use a neural network classifier on structured tabular data.


Part 2 — Types of Data and Why They Matter

A model choice begins with understanding the outcome and predictors.

2.1 Continuous data

Continuous variables can take infinitely many values within a range.

Examples:

  • salary
  • height
  • revenue
  • loss amount

These often call for regression models.


2.2 Categorical or discrete data

Categorical variables represent groups or labels.

Examples:

  • default / non-default
  • churn / stay
  • department name
  • sentiment class

These often call for classification models or categorical analysis.


2.3 Why this matters

The type of target drives the model family.

Continuous outcome → regression
Categorical outcome → classification
No labels → clustering / unsupervised learning

That is one of the first things I should determine in any ML problem.


Part 3 — The Machine Learning Workflow

A strong ML process is more important than memorizing algorithms.

I can think of it as a practical modeling pipeline.

3.1 Plan

This means defining:

  • the business problem
  • the target variable
  • the success metric
  • the modeling constraints
  • the deployment context

Examples:

  • In churn, maybe recall matters because missing an at-risk customer is expensive.
  • In credit risk, missing a bad borrower can be more dangerous than rejecting a good one.
  • In HR attrition, interpretability may matter because managers need to understand the drivers.

So before building the model, I must know what “good performance” actually means.


3.2 Analyze

This includes:

  • exploratory data analysis
  • missing values
  • outliers
  • class imbalance
  • feature relationships
  • feature quality

This is where I decide whether my raw data is even ready for modeling.


3.3 Construct

This includes:

  • train / validation / test splits
  • preprocessing pipeline
  • feature engineering
  • model selection
  • hyperparameter tuning

This is where the model is actually built.


3.4 Execute

This includes:

  • evaluation
  • interpretation
  • communication
  • deployment thinking
  • monitoring

This is where the model becomes useful to the business.


Part 4 — Feature Engineering: How Data Becomes Model Input

A huge part of ML performance comes from feature engineering.

4.1 Feature selection

Feature selection means choosing which variables should enter the model.

I may remove:

  • ID columns
  • names
  • leakage variables
  • columns with almost no information
  • columns that would not exist at prediction time

Example:

  • CustomerId usually should not drive a churn model
  • Surname is usually not a stable predictive feature

4.2 Feature transformation

This means changing variables into a more usable form.

Examples:

  • standardization
  • min-max scaling
  • one-hot encoding
  • log transformation
  • binning

Some algorithms care a lot about scaling. Others do not.

For example:

  • Naive Bayes and distance-based methods can benefit from scaled inputs
  • tree-based methods usually do not need scaling

4.3 Feature extraction

This means creating new variables from existing variables.

Examples:

  • tenure / age
  • debt / income
  • monthly_hours > threshold
  • interaction terms

This can make the model see structure that was hidden in the raw columns.


4.4 Class imbalance

In many real-world classification problems, one class is much rarer than the other.

Examples:

  • fraud is rare
  • default is less common than non-default
  • churn may be a minority class

If I do nothing, the model may learn to predict the majority class almost all the time.

Common fixes

Downsampling

Reduce the majority class size.

Upsampling

Increase the minority class size by duplication or synthetic generation.

Class weights

Tell the algorithm to penalize mistakes on the minority class more heavily.

Threshold tuning

Keep the model the same, but change the decision threshold.

This matters because sometimes the real issue is not the model family but the classification threshold.


Part 5 — Recommendation Systems: A Useful Side Branch

Even though recommendation systems are not the center of my current project notes, I should know the two broad families.

5.1 Content-based filtering

This recommends items based on item attributes.

Example:

  • if I like a movie with a certain genre, director, and pace, the system recommends similar movies with similar content attributes

The model focuses on the item’s own features.


5.2 Collaborative filtering

This recommends items based on user behaviour patterns.

Example:

  • users who liked what I liked also liked these other items

The model focuses on interaction patterns across many users.


Part 6 — Ethics, Bias, and Explainability

This part is extremely important.

6.1 Garbage in, garbage out

If I train on biased, incomplete, or low-quality data, the model can learn those flaws.

Examples:

  • historical lending data may reflect past human bias
  • hiring data may reflect biased selection processes
  • churn labels may be noisy or incomplete

So a model is not automatically objective just because it is mathematical.


6.2 Class imbalance and unfair outcomes

If one class is rare, the model may ignore it.

If a subgroup is underrepresented, the model may perform worse on that subgroup.

So I should not evaluate only overall accuracy. I should also ask:

  • who is being missed?
  • which subgroup is getting more false positives?
  • which subgroup is getting more false negatives?

6.3 Explainability

Some models are easy to explain.
Some models are much harder to explain.

More interpretable

  • linear regression
  • logistic regression
  • small decision trees

Less interpretable

  • large random forests
  • gradient boosting
  • XGBoost
  • deep neural networks

This matters because high-stakes domains like finance, healthcare, and regulation often need strong interpretability and documentation.

That is one reason my 1_lending_club_credit_risk_masterclass note uses logistic regression so centrally.


Part 7 — Naive Bayes

Naive Bayes is one of the simplest probabilistic classifiers.

7.1 The core idea

It is based on Bayes’ theorem.

At a high level, Bayes’ theorem updates my belief about a class after seeing evidence.

The model asks:

Given these feature values, how likely is this observation to belong to class A versus class B?


7.2 Why it is called “naive”

It assumes that the predictors are conditionally independent given the class.

That is often unrealistic in the real world.

For example:

  • salary and credit score may be related
  • balance and utilization may be related
  • age and tenure may be related

But Naive Bayes can still perform surprisingly well even when the assumption is not perfectly true.


7.3 Gaussian Naive Bayes

Gaussian Naive Bayes is used when I assume the features follow roughly normal distributions within each class.

The model estimates class-conditional distributions and then applies Bayes’ theorem.


7.4 When it is useful

Naive Bayes is often useful when:

  • I want a fast baseline model
  • the dataset is not huge
  • the features are reasonably well-behaved
  • I want a quick benchmark for classification

It is especially well known in text classification.


7.5 Why scaling can matter here

When features exist on very different scales, some algorithms become unstable or hard to compare across dimensions.

Using a scaler like Min-Max scaling can help normalize the numeric range.

Example

from sklearn.preprocessing import MinMaxScaler
from sklearn.naive_bayes import GaussianNB
 
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
 
model = GaussianNB()
model.fit(X_train_scaled, y_train)

This does not mean scaling is always mandatory for every Naive Bayes setup, but it is often a practical preprocessing step in educational workflows.


Part 8 — Unsupervised Learning with K-Means Clustering

K-Means is one of the most common clustering algorithms.

8.1 The core goal

K-Means tries to divide the data into K clusters such that points inside a cluster are similar to one another and different from points in other clusters.

It is an unlabeled learning method.

That means I do not have a target column telling me the correct cluster.


8.2 The algorithm intuition

The algorithm works like this:

  1. choose K starting centroids
  2. assign each point to its nearest centroid
  3. recompute each centroid as the mean of the points in its cluster
  4. repeat until the centroids stop moving much

That final state is called convergence.


8.3 Why scaling matters a lot here

K-Means is distance-based.

So if one variable is on a huge scale and another is on a tiny scale, the bigger one can dominate the cluster structure.

That is why scaling is usually very important for K-Means.


8.4 How to choose K

This is one of the most common beginner questions.

Inertia

Inertia measures how tightly packed the points are inside their assigned clusters.

Lower inertia is better, but it almost always decreases as I add more clusters.

So inertia alone cannot choose the best K.


Elbow method

I plot inertia for many values of K.

I then look for the “elbow,” where adding another cluster stops giving a major improvement.

That point is often a reasonable compromise between simplicity and fit.


Silhouette score

The silhouette score measures both:

  • how close each point is to its own cluster
  • how far it is from other clusters

It ranges roughly from -1 to 1.

  • closer to 1 is better
  • near 0 means overlap
  • below 0 suggests poor assignments

8.5 Small code skeleton

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
 
kmeans = KMeans(n_clusters=4, random_state=42)
clusters = kmeans.fit_predict(X_scaled)
 
score = silhouette_score(X_scaled, clusters)
print(score)

Part 9 — Decision Trees

Decision trees are one of the most intuitive ML models.

9.1 Core idea

A decision tree splits the data step by step using features that best separate the outcome.

It behaves like a flowchart:

Is credit score < threshold?
→ yes / no
Is income < threshold?
→ yes / no
...
→ final prediction

9.2 Why they are attractive

Decision trees are attractive because they:

  • are easy to visualize
  • handle nonlinear relationships
  • can capture interactions automatically
  • need less preprocessing than many other models
  • are often robust to outliers

This is why they were so natural in my 6_employee_retention_tree_models_masterclass note.


9.3 Why they can fail

A single decision tree can overfit very easily.

That means it can memorize noise in the training data and perform badly on new data.

That is why tree-based systems often need hyperparameter control.


9.4 Important hyperparameters

Examples:

  • max_depth
  • min_samples_split
  • min_samples_leaf
  • max_features

These determine how flexible or constrained the tree becomes.

A shallower tree is simpler and more stable.
A very deep tree can become too specific to the training data.


Part 10 — Hyperparameter Tuning and Grid Search

Hyperparameters are settings chosen before training.

They are not learned directly from the data in the same way model coefficients are.

10.1 Why tuning matters

A model may be good in principle but weak in performance because its hyperparameters are poor.

For example:

  • a tree may be too deep
  • a random forest may use too few trees
  • an XGBoost model may learn too aggressively

10.2 Grid search

Grid search means I define a set of hyperparameter values to try, and the computer tests all combinations.

Example idea:

param_grid = {
    'max_depth': [3, 5, 7],
    'min_samples_leaf': [1, 2, 5]
}

The model trains and validates every combination, and then I choose the best one according to a chosen metric.


10.3 Why cross-validation is usually paired with grid search

If I tune on a single split, I may overreact to randomness in that split.

Cross-validation gives a more stable performance estimate.

That is why GridSearchCV is so common.


10.4 Code skeleton

from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
 
model = DecisionTreeClassifier(random_state=42)
 
param_grid = {
    'max_depth': [3, 5, 7],
    'min_samples_leaf': [1, 2, 5]
}
 
grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,
    scoring='f1'
)
 
grid.fit(X_train, y_train)
print(grid.best_params_)

Part 11 — Ensemble Learning

Ensemble learning means combining multiple models to create a stronger final predictor.

The intuition is simple:

a group of imperfect models can often outperform a single model if their errors are not identical.

This is sometimes called the wisdom of the crowd idea.

There are two major ensemble styles I should know here:

  • bagging
  • boosting

11.1 Bagging and Random Forest

The core idea

Bagging means building many models on different bootstrap samples and then aggregating their predictions.

A bootstrap sample is created by sampling rows with replacement from the original training set.

So each tree sees a slightly different version of the data.


Random Forest

Random Forest is the most famous bagging model for trees.

It adds two powerful randomness ideas:

  1. each tree gets a bootstrap sample of rows
  2. each split only considers a random subset of features

This reduces correlation between trees and lowers variance.


Why Random Forest works well

Random forests are powerful because they:

  • reduce overfitting relative to a single tree
  • handle nonlinear patterns
  • capture interactions
  • usually need less feature engineering than linear models

This connects directly to 6_employee_retention_tree_models_masterclass.


11.2 Boosting

Boosting builds models sequentially, not in parallel.

Each new model tries to learn from the mistakes of the earlier models.

This often creates very strong predictive systems.


AdaBoost

AdaBoost increases the weight of observations that were misclassified earlier.

So later learners pay more attention to hard cases.

The model keeps adjusting itself toward what earlier trees got wrong.


Gradient Boosting

Gradient boosting is a more general and powerful idea.

Instead of simply reweighting observations, it fits new trees to the residual error left by earlier models.

So each new learner is trying to reduce the remaining mistakes.


XGBoost

XGBoost is a highly optimized implementation of gradient boosting.

It is extremely popular for tabular data because it often delivers strong predictive performance.

This connects directly to 7_socio_economic_household_classification_masterclass and also to the credit-risk extension thinking in 1_lending_club_credit_risk_masterclass.


The tradeoff

Boosting is often more accurate than simpler models, but it is also:

  • harder to interpret
  • more sensitive to tuning
  • more computationally expensive

So stronger raw performance does not automatically mean it is always the best business choice.


Part 12 — Cross-Validation

Cross-validation is one of the most important validation ideas in ML.

12.1 Why a single train-test split is not always enough

Suppose I split the data once and get a performance score.

That score may depend partly on luck:

  • maybe the test split was unusually easy
  • maybe the validation split was unusually hard

Cross-validation reduces my dependence on one lucky or unlucky split.


12.2 K-fold cross-validation

In K-fold CV:

  1. I divide the training data into K folds
  2. I train on K-1 folds
  3. I validate on the remaining fold
  4. I repeat until every fold has served as validation once
  5. I average the results

This gives a more stable estimate of generalization performance.


12.3 Why it matters in interviews

If I say “my model got 96% accuracy,” a strong interviewer may ask:

Was that from one split, or did you cross-validate it?

That question is really about robustness.


Part 13 — Model Saving and Pickling

Sometimes a model takes a long time to train.

That makes it inefficient to retrain from scratch every time I reopen the notebook.

13.1 What pickling does

Pickling saves a trained Python object into a binary file.

That object can later be loaded back into memory.

So instead of retraining the model, I can reuse the saved version.


13.2 Small code skeleton

import pickle
 
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)
 
with open('model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

13.3 Important caution

Pickling is convenient, but I must also keep:

  • the preprocessing pipeline
  • the training environment assumptions
  • the feature order
  • the versioning discipline

Saving only the model without the transformations can break deployment.

This connects strongly to the monitoring and pipeline-consistency lessons in 1_lending_club_credit_risk_masterclass.


Part 14 — Feature Importance

Even if a model is complex, I often still want to know which variables matter most.

14.1 Why feature importance helps

Feature importance can help me:

  • understand the drivers of predictions
  • explain the model to stakeholders
  • spot suspicious signals
  • reduce noisy variables

14.2 Tree-based importance

Tree-based models often produce feature importance scores based on how much each feature helps reduce impurity or improve splits.

Example:

  • income
  • tenure
  • credit score
  • monthly hours
  • satisfaction level

These can be ranked by importance.


14.3 Why I should be careful

Feature importance is useful, but it is not the same as causation.

A variable can be important for prediction without being the true real-world cause of the outcome.

That warning is important across all my notes.


Part 15 — Model Evaluation for Classification

Because many of my projects are classification problems, this section matters a lot.

15.1 Confusion matrix

A confusion matrix helps me see:

  • true positives
  • true negatives
  • false positives
  • false negatives

This gives a richer view than just accuracy.


15.2 Precision

Precision asks:

Out of everything I predicted as positive, how many were actually positive?

This matters when false positives are costly.


15.3 Recall

Recall asks:

Out of all the actual positives, how many did I successfully capture?

This matters when false negatives are costly.


15.4 Accuracy

Accuracy asks:

Out of all predictions, how many were correct?

This can be misleading under class imbalance.


15.5 F1 score

The F1 score is the harmonic mean of precision and recall.

It is useful when I want a balance between the two.


15.6 ROC and AUC

The ROC curve tracks:

  • true positive rate
  • false positive rate

across many thresholds.

The AUC summarizes that ranking performance into a single number.

These connect strongly to both 2_regression_analysis_masterclass and 1_lending_club_credit_risk_masterclass.


Part 16 — Connecting the Algorithms to My Projects

This part helps me see machine learning not as isolated theory, but as the common language underneath my notes.

16.1 Lending Club credit risk

In 1_lending_club_credit_risk_masterclass:

  • supervised learning is used for default prediction
  • logistic regression provides an interpretable scorecard
  • validation and monitoring show the full lifecycle
  • later extensions bring in LGD, EAD, and expected loss

This is the most complete lifecycle note in my system.


16.2 Bank churn

In 5_bank_churn_neural_networks_masterclass:

  • supervised classification is used to predict churn
  • threshold tuning matters because recall is important
  • the neural network is a more flexible model family than basic linear methods

16.3 Employee retention

In 6_employee_retention_tree_models_masterclass:

  • decision trees and random forests become the main learning tools
  • feature importance becomes very practical
  • hyperparameter control matters for tree stability

16.4 Household classification

In 7_socio_economic_household_classification_masterclass:

  • large tabular data is modeled with XGBoost
  • boosting becomes the strongest practical ensemble idea
  • careful feature handling and evaluation matter a lot

16.5 Twitter sentiment NLP

In 8_twitter_sentiment_nlp_masterclass:

  • text becomes numeric features through vectorization
  • classification logic still remains the same underneath
  • evaluation metrics still depend on the confusion matrix logic

Part 17 — The Math Intuition I Should Know for Interviews

I do not need to derive everything from scratch, but I should understand the broad mathematical ideas.

17.1 Objective functions

Every ML model is trying to optimize something.

Examples:

  • linear regression minimizes squared error
  • logistic regression maximizes likelihood
  • K-Means minimizes within-cluster distances
  • decision trees minimize impurity measures like Gini impurity or entropy
  • boosting iteratively reduces residual error

That means models differ not only in form, but also in what loss function they are trying to optimize.


17.2 Bias-variance tradeoff

A model can fail in two main ways:

High bias

The model is too simple.
It misses real structure.
This is underfitting.

High variance

The model is too flexible.
It memorizes noise.
This is overfitting.

Many ML techniques are really attempts to balance bias and variance.

Examples:

  • pruning trees
  • using random forests
  • regularization
  • cross-validation
  • hyperparameter tuning

17.3 Why ensembles often work

Bagging reduces variance.
Boosting reduces systematic error step by step.

That is why ensemble methods are so dominant in practice.


Part 18 — Python Toolkit I Should Know

The practical ML ecosystem usually includes:

  • pandas for data manipulation
  • numpy for numerical operations
  • matplotlib and seaborn for visualization
  • scikit-learn for modeling pipelines, preprocessing, CV, metrics, trees, clustering, and Naive Bayes
  • xgboost for gradient boosting on tabular data
  • pickle for model saving

This toolkit appears in different combinations across my project notes.


Part 19 — What I Should Say in Interviews

If I need to explain machine learning simply, I can say:

Machine learning is the process of using data to learn patterns that generalize to new observations. The real work is not just choosing a model. It is defining the target, engineering stable features, validating properly, controlling overfitting, choosing metrics that match the business problem, and making sure the final model is interpretable and robust enough for the use case.

If I need to explain supervised vs unsupervised learning:

Supervised learning uses labeled outcomes to learn prediction or classification. Unsupervised learning has no labels and instead discovers hidden structure, such as clusters or anomalies.

If I need to explain why tree ensembles are powerful:

A single tree is easy to understand but unstable. Ensemble methods improve that by combining many trees. Random forests reduce variance through bagging and random feature selection, while boosting learns sequentially from earlier errors and often achieves stronger predictive performance.

If I need to explain why metrics matter:

Accuracy alone can be misleading, especially under class imbalance. I choose metrics like precision, recall, F1, ROC-AUC, or KS based on the type of business error that matters most.


Part 20 — Final Summary I Want to Retain

What machine learning really is

Machine learning is a structured way of learning patterns from data and using those patterns on new observations.

The big categories

  • supervised learning
  • unsupervised learning
  • reinforcement learning
  • deep learning

The practical workflow

Plan
→ Analyze
→ Construct
→ Execute

The important model families in this note

  • Naive Bayes
  • K-Means
  • decision trees
  • random forests
  • AdaBoost
  • gradient boosting
  • XGBoost

The practical discipline I should remember

  • define the right target
  • split data properly
  • engineer features carefully
  • choose metrics that match the business problem
  • tune hyperparameters responsibly
  • validate with cross-validation when possible
  • do not confuse prediction with causation
  • do not confuse importance with causation
  • do not ignore ethics, bias, and explainability

The most important mental model

A machine learning project is not:

pick algorithm → get score

It is:

understand the problem
→ understand the data
→ build a defensible pipeline
→ validate honestly
→ communicate clearly
→ improve responsibly

That is the machine learning mindset I want to carry into all my notes and interviews.

Linked Mentions

CartPole Reinforcement Learning Masterclass — REINFORCE, Custom Rewards, PPO, DQN, Actor-Critic, SAC, and Twin-Q

03 — Portfolio Projects

Antidiabetic Drug Prescription Forecasting — Time-Series Modeling with STL Decomposition, Stationarity Testing, SARIMA, Rolling Forecasts, and MAPE

03 — Portfolio Projects

NIFTY 100 Portfolio Optimization — Modern Portfolio Theory, Log Returns, Sharpe Ratio, Monte Carlo Weights, and Efficient Frontier

03 — Portfolio Projects

Python for Data Analytics Master Cheat Sheet — Core Python, Pandas, Visualization, and Git

02 — Foundations

Tharun Kumar Gajula

01 — Profile