August 13, 2020

Breast Cancer Classifier Project

  1. It is common practice to split function parameters on separate lines when there are too many for a single line:

    1. Example: func(

      1. param1,

      2. param2,

      3. param3,

      4. param4

    2. )

  2. Example: Store the results from a Python function returning arrays

    1. array = [1, 2, 3]

    2. first, second, third = array

  3. training_data, validation_data, training_labels, validation_labels = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size = 0.2, random_state = 100 )

  4. Use the classifier with k = 3 and check the score:

    1. classifier.fit(training_data, training_lables)

    2. print(classifier.score(validation_data, validation_labels))

  5. K was hardcoded to 3. What if we loop from 1 to 100 to see what K is most accurate?

    1. accuracies = [ ]

    2. for k in range(1, 101)

      1. classifier = KNeighborsClassifier(n_neighbors = k)

      2. classifier.fit(training_points, training_labels)

      3. accuracies.append(classifier.score(validation_data, validation_labels)

      4. print(accuracies)

    3. plt.plot(k_list, accuracies)

    4. plt.show( )

  6. Overfitting - rely too much on training data - K is low

  7. Underfitting - failure to learn from training data - K is too high

Previous
Previous

August 17, 2020

Next
Next

August 12, 2020