August 13, 2020
Breast Cancer Classifier Project
It is common practice to split function parameters on separate lines when there are too many for a single line:
Example: func(
param1,
param2,
param3,
param4
)
Example: Store the results from a Python function returning arrays
array = [1, 2, 3]
first, second, third = array
training_data, validation_data, training_labels, validation_labels = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size = 0.2, random_state = 100 )
Use the classifier with k = 3 and check the score:
classifier.fit(training_data, training_lables)
print(classifier.score(validation_data, validation_labels))
K was hardcoded to 3. What if we loop from 1 to 100 to see what K is most accurate?
accuracies = [ ]
for k in range(1, 101)
classifier = KNeighborsClassifier(n_neighbors = k)
classifier.fit(training_points, training_labels)
accuracies.append(classifier.score(validation_data, validation_labels)
print(accuracies)
plt.plot(k_list, accuracies)
plt.show( )
Overfitting - rely too much on training data - K is low
Underfitting - failure to learn from training data - K is too high