August 20, 2020
Logistic Regression
The function used to evaluate the performance of a machine learning model is called a loss function or a cost function.
The loss function for Logistic Regression is called LogLoss.
Classification Threshold is the point at which we decide which class the sample belongs to.
The default threshold for many algorithms is 0.5. If the predicted probability is greater than or equal to the threshold, then the sample is in the positive class, otherwise, it is in the negative class.
Sci-Kit Learn is a Python library that helps build, train, and evaluate Machine Learning models.
Create a Logistic Regression object
model = LogisticRegression
Fit the model on the data
model.fit(features, labels)
After fit, model.coef_ and model.intercept_ are available
Predict positive or negative class
model.predict(features)
If we predicted probability
model.predict_proba(features)
The data is required to be normalized prior to using sklearn’s Logistic Regression model.
Feature Importance - We can compare the feature coefficients’ magnitudes and signs to determine which features have the greatest impact on class prediction and if that impact is positive or negative.
Features with larger positive coefficients will increase the probability of a data sample belonging to the positive class.
Features with larger negative coefficients will decrease the probability of a data sample belonging to the positive class.
Features with small positive or negative coefficients have minimal impact on the probability of a data sample belonging to the positive class.