August 21, 2020

Logistic Regression Project

Predict the Titanic Survivors

  1. Load the Data

    1. use Pandas to load .csv

      1. pd.read_csv(‘file_name’)

  2. Clean the Data

    1. Use .map function

      1. passengers[‘Sex’] = passengers[‘Sex’].map({‘male’: 0, ‘female’: 1})

    2. Replace missing values - NaN - using .fillna( ) function

      1. passengers[‘Age’].fillna(inplace = True, value = passengers[‘Age’].mean( ))

    3. Store “First Class” passengers using .apply( ) function

      1. passengers[‘First Class’] = passengers[‘Pclass’].apply( lambda p: 1 if p == 1 else 0)

  3. Select and Split Data

    1. features = passengers[[‘Sex’, ‘Age’, ‘First Class’, ‘Second Class’]]

    2. survived = passengers[‘Survived’]

    3. Save the train test split results as variables

      1. train_features, test_features, train_labels, test_labels = train_test_split(features, survived)

  4. Normalize the Data

    1. Create a standard scaler

      1. scaler = StandardScaler( )

    2. To determine scaling factors and apply the scaling to the feature data:

      1. train_features = scaler.fit_transform(train_features)

    3. To apply the scaling to the test data

      1. test_features = scaler.transform(test_features)

  5. Create and Evaluate the Model Using Logistic Regression

    1. model = LogisticRegression( )

    2. model.fit(train_features, train_labels)

    3. print(model.score(train_features, train_labels))

      1. do the same for test data

      2. What are the scores?

    4. Look at coefficient data to see which feature is the most useful

      1. Sex is the most useful feature, then First Class

  6. Predict with the Model

    1. Use samples with the same features (sex, age, class, …)

    2. Transform the data using the scaler.

      1. sample_passengers = scaler.transform(sample_passengers)

    3. Make a prediction using the normalized sample

      1. print(model.predict(sample_passengers))

      2. OR

      3. print(model.predict_proba(sample_passengers)) — to see probabilities

Previous
Previous

August 23, 2020

Next
Next

August 20, 2020