Skip to content

koushik16/Uni-and-Multi-Variante-Decision-Trees-Scratch-on-a-Custom-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Decision trees are a type of supervised machine learning algorithm used for classification and regression tasks. The structure of a decision tree resembles a tree, where each internal node represents a test or decision on a feature (attribute), each branch represents the outcome of that test, and each leaf node represents a class label (for classification) or a continuous value (for regression).

The process of making decisions in a decision tree involves breaking down the data into smaller subsets while an associated decision tree is incrementally developed. The final decision tree consists of nodes (for making decisions) and leaves (final outcomes).

Univariate Decision Trees

A univariate decision tree is a type of decision tree where the decision at each node is based on a single feature. This means that when the tree decides to split the data, it uses one feature at a time to determine the split.

Characteristics:

  • Simplicity: They are simpler to understand and interpret because the decision rule at each node involves only one feature.
  • Efficiency: They are usually more computationally efficient compared to multivariate decision trees.
  • Split Criteria: Common split criteria include Gini index, Information gain, and Chi-square.

Example: If you are using a decision tree to classify whether an email is spam or not, a univariate tree might make decisions like:

  • Is the word "discount" in the subject line?
  • Is the sender from a known domain?
  • Does the email contain more than one link?

Multivariate Decision Trees

A multivariate decision tree allows for decisions at each node to be based on multiple features simultaneously. This can involve linear combinations of multiple features or other more complex functions.

Characteristics:

  • Complexity: These trees are more complex and harder to interpret because each decision can involve multiple features.
  • Flexibility: They can be more flexible and potentially provide better performance when the relationship between features is complex.
  • Split Criteria: The splitting criterion could be more sophisticated, involving linear or nonlinear combinations of features.

Example: In the same spam detection scenario, a multivariate decision tree might use a combination of features such as the frequency of certain words, the presence of a specific attachment type, and the sender's domain, all together to make a split.

Key Differences:

  • Interpretability: Univariate decision trees are easier to interpret because each decision is based on a single feature, whereas multivariate decision trees are more complex to interpret.
  • Model Complexity: Multivariate decision trees can model more complex relationships by considering multiple features at once, while univariate trees are limited to simpler, one-dimensional splits.
  • Computational Cost: Multivariate trees generally require more computational power to construct, as they evaluate combinations of features rather than just individual features.

Applications:

  • Univariate: Often used when model interpretability is crucial or when the relationships between features are straightforward.
  • Multivariate: Used in more complex scenarios where features interact in non-trivial ways, and where capturing these interactions can improve predictive performance.

About

Implementing Uni and Multi varient Decision trees from scratch on a custom dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published