Confidence Values If you’ve ever learned any basic statistics or probability then you’ve probably encountered the 68-95-99.7 rule at some point. This rule is simply the statement that, for a normally distributed variable, roughly 68% of values will fall within one standard deviation of the mean, 95% of values within two standard deviations, and 99.7% within three standard deviations. These confidence values are quite useful to memorize because values that are computed from data are often approximately normally distributed due to the central limit theorem.
Have a great idea for an article?
We're always looking for guest contributors or article suggestions. Shoot us an email at firstname.lastname@example.org, we would love to hear yours!
Choosing Weights: Small Changes, Big Differences There are a number of important, and sometimes subtle, choices that need to be made when building and training a neural network. You have to decide which loss function to use, how many layers to have, what stride and kernel size to use for each convolution layer, which optimization algorithm is best suited for the network, etc. With so many things that need to be decided, the choice of initial weights may, at first glance, seem like just another relatively minor pre-training detail, but weight initialization can actually have a profound impact on both the convergence rate and final quality of a network.