Logistic Regression

Despite its name, logistic regression is actually a classification method. It can be applied to both binary and multi-class classification tasks.

Binary Classification

Logistic regression is a linear-based model that follows the linear equation to predict values:

\[ z = \omega x + b, \]

where \(\omega\) is the weight of the model, \(x\) represents the independent data, and \(b\) shows the bias. For the classification purposes, the output \(z\) is fed into the sigmoid function as

\[ \hat{z} = \frac{1}{1 + e^{-z}} \]

Finally, \(\hat{y}\) represents the corresponding label so that

\[ \hat{y} = \begin{cases} 0 & \text{if~} \hat{z} < 0.5\\ 1 & \text{otherwise} \end{cases} \]

Multi-class Classification

In multi-class task, it is assumed that there are \(C\) classes. Thus, for each data point \(x_{i}\in X\), where \(X\) is the set of data points, the output \(z_{i}\) is fed into the softmax function to predict the corresponding probability, i.e., the probability that \(z_{i}\) belongs to class \(c\in C\). Therefore, we have

\[ \hat{z_{i}} = \frac{exp(z_{i})}{\sum_{c = 1}^{C} exp(z_{c})} \]

Assuming that our data is Independent and Identically Distributed (IID), the loss for a single data point is given by

\[ \mathcal{L} = -\sum_{c = 1}^{C}y_{c}\log(z_{c}) \]

Implementation

The machine learning logistic regression model for binary classification in Python was developed from scratch following the guidelines provided in [2]. The complete implementation script is available on Logistic Regression (Binary) from Scratch. Also, the corresponding model for the multi-class classification was developed from scratch, that is available on Logistic Regression (Multi-Class) from Scratch.

We developed a logistic regression classifier for the Breast Cancer dataset using the built-in functions provided in scikit-learn [1] in Python. The following screenshot illustrates the training process and the evaluation results of the model. Notably, the model is imported from sklearn.linear_model. The complete implementation script is available on the GitHub page Logistic Regression for Binary Classification.

We also conducted the experiment using a multi-class logistic regression model. For this purpose, we utilized the Iris dataset and trained the model under two different configurations. First, we fitted the model using the One-vs-Rest (OvR) classification strategy with the Liblinear solver, which supports both L1 and L2 regularization. In the second configuration, we developed the model using the \textbf{Multinomial} classification approach with the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) solver. The corresponding results are presented in the following screenshot. The complete implementation script is available on the GitHub page Logistic Regression for Multi-Class Classification.

References

[1] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

[2] Misra Turp, “How to implement logistic regression from scratch with python,” AssemblyAI, accessed: September 14, 2022, https://www.youtube.com/watch?v=YYEJ_GUguHw.