Scikit-Learn Crash Course
Scikit-Learn is an essential tool for machine learning with Python
In this post, we’re going to go through the fundamental features of Scikit-Learn. We will approach theory and real world applications in another series.
Until then, this will serve as a way to get familiar with the tools we can use.
Loading Data
Before you begin writing any code, you need to understand where your data is coming from.
Common Data Sources:
CSV
Text
Web
Common Data Forms:
Pandas Dataframes
NumPy Arrays
SciPy Matrices
Train and Test
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=)
Pre-Processing
1. Standardize your data:
Scale -> transorm train -> transorm test
standardScaler().fit(x_train)
.transform(x_train)
.transorm(x_test)
2. Normalize your data:
Normalize -> transform x_train -> transorm test
Normalizer().fit(x_train)
.transform(x_train)
.transform(x_test)
3. Binarize your data:
Binarizer(threshold= int).fit(x_data)
.transform(x_data)
Handling Missing Data
Imputer(missing_values= int, strategy = 'mean', axis = int)
.fit_transform(x_data)
Supervised Learning
Estimators
KNN:
neighbors.KNeighborsClassifier(n_neighbors=5)
Linear Regression:
LinearRegression(normalize=True)
Naive Bayes:
GaussianNB()
Support Vector Machines:
SVC(kernel='linear')
Fitting Models:
your_lr.fit(X, y)
your_knn.fit(X_train, y_train)
your_svc.fit(X_train, y_train)
Predictors:
your_svc.predict(np.random.random((2,5)))
your_lr.predict(X_test)
your_knn.predict_proba(X_test)