Conditional entropy

1/8/2024

It can also be used for the optimization of almost any classification algorithm, like logistic regression. Be it categorical, sparse, or binary cross-entropy, the metric is one of the default go-to loss functions for high-performing neural nets. One of the favorite loss functions of neural networks is cross-entropy. Information gain can be thought of as the purity in a system: the amount of clean knowledge available in a system.ĭecision trees use entropy in their construction: in order to be as effective as possible in directing inputs down a series of conditions to a correct outcome, feature splits (conditions) with lower entropy (higher information gain) are placed higher on the tree. A high entropy means low information gain, and a low entropy means high information gain. Most scenarios applicable to data science are somewhere between astronomically high and perfectly low entropy. On the other hand, a weighted coin with events has very low entropy, and given the current information, we can almost definitively say that the next outcome will be tails. If you (or a machine learning algorithm) were to predict the next coin flip, you would be able to predict an outcome with any certainty - the system contains high entropy. Consider, for example, a coin toss - if the toss the coin four times and the events come up.

0 Comments

Conditional entropy

Leave a Reply.

Author

Archives

Categories