August 20, 2020

Logistic Regression

  1. The function used to evaluate the performance of a machine learning model is called a loss function or a cost function.

  2. The loss function for Logistic Regression is called LogLoss.

  3. Classification Threshold is the point at which we decide which class the sample belongs to.

    1. The default threshold for many algorithms is 0.5. If the predicted probability is greater than or equal to the threshold, then the sample is in the positive class, otherwise, it is in the negative class.

  4. Sci-Kit Learn is a Python library that helps build, train, and evaluate Machine Learning models.

    1. Create a Logistic Regression object

      1. model = LogisticRegression

    2. Fit the model on the data

      1. model.fit(features, labels)

      2. After fit, model.coef_ and model.intercept_ are available

    3. Predict positive or negative class

      1. model.predict(features)

      2. If we predicted probability

        1. model.predict_proba(features)

    4. The data is required to be normalized prior to using sklearn’s Logistic Regression model.

  5. Feature Importance - We can compare the feature coefficients’ magnitudes and signs to determine which features have the greatest impact on class prediction and if that impact is positive or negative.

    1. Features with larger positive coefficients will increase the probability of a data sample belonging to the positive class.

    2. Features with larger negative coefficients will decrease the probability of a data sample belonging to the positive class.

    3. Features with small positive or negative coefficients have minimal impact on the probability of a data sample belonging to the positive class.

Previous
Previous

August 21, 2020

Next
Next

August 19, 2020