November 3, 2019
Calculus is the science of measuring continuous change across time or across space. Many things in data science change in a continuous way. Models themselves, for example, get more accurate over many iterations, and the gradient of their increasing accuracy can be modeled so that you may identify and pluck the most accurate version of the model from the field of every possible version.
Statistical Inference is facts and figures plus guessing. It attempts to make predictions about and find correlations between populations. It uses probability so its never exact or definite; it is a different kind of mathematics than algebra and calculus. It tells you what is more or less likely within the realm of possibility. When events don’t turn out how you expect, statistical inference does not take the blame.
Python is a high-level, abstract, and readable; object-oriented, not functional; slow at recursion and slow at iterating over loops; optimized for its own list comprehensions and for vector-wise operations, especially with libraries like NumPy and SciPy. Write Pythonically by following PEP8 style.
Explaining models to non-technical folks is an essential part of the data scientist’s job. Failure to do this responsibly has led to a lot of misleading reportage about the work of data science.
BIG DATA - data is ‘big’ if it is so unwieldy for you to work with on one machine, that it requires distributed computing.