Intersection between Code and Justice

How can programming affect the ethics in judicial systems?

I enrolled in an AI program that has sparked my interest in computer science and artificial intelligence. One of the most compelling aspects of this field is that it can be applied to almost any function in our world. From detecting cancer and revolutionizing the medical field to self-driving cars, AI has become an indispensable part of the modern, technological world. 

Since then I have been working on a project that represents the intersection between AI and ethics, one of the more challenging topics in AI. There are a lot of disputed questions about how much responsibility these machines must take. If an AI program predicts a wrong diagnosis that kills a hospital patient, who should be blamed: the doctor or the machine? 

One of the most present issues in the world today is racial injustice. It has infected some of the most impartial institutions such as the justice system. It is for this reason that this study will be concerned with the integration of AI in the criminal justice system. I will essentially attempt to see if AI can restore impartiality in justice systems.

(I'd like to take this space to note that I am not an advanced coder but rather an eager student with a great interest in coding and ethics. I will be explaining the different codes I show but this mainly serves as a solid base for future explorations in this area)


Explanation of the Code behind the Project

We used a risk-assessment tool COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) in order to predict whether a defendant is likely to commit another crime if released. The data comes from Broward County, FL. I then built a logistic regression model for 1000 iterations. However, this yielded only 68.3% accuracy, indicating that other models needed to be tested. After re-building the model to fit for RandomForestClassifier, a classifier that would allow for more flexibility due to the decision tree.Yet this yielded an accuracy score of 72.2%, not much better than our previous model. Another method needed to be used...

Data Set

Construction of Models

Building the Logistic Regression Model:

model = LogisticRegression(max_iter=1000), y_train)

model.score(X_train, y_train)

Accuracy Score: 0.68345

Building the RandomForestClassifier Model:

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(max_depth=5), y_train)

model.score(X_train, y_train)

Accuracy Score: 0.70133


Confusion Matrices

So why can't we rely on our accuracy scores (besides the fact that the values themselves were low)?

Well, out of the ~70% accuracy scores, we don't entirely know what is happening with the other 30%. In other words, our accuracy scores don't necessarily show if it wrongly predicted that defendants would reoffend or wrongly predicted that they wouldn't reoffend; it only displays what percentage of the test data is predicted inaccurately, without specification.

Confusion Matrices are 2 by 2 grids that can model and break down what we couldn't fully understand from our accuracy scores alone. The confusion matrix shown below presents the number of true positives (top left), false positives (top right), true negatives (bottom left), and false negatives (bottom right) across both races. We are specifically looking at the false positive rate as this number indicates the rate of defendants that the model predicted would recidivate but, in reality, did not. The top matrix contains our data on Caucasian defendants and the bottom matrix contains our African-American data after we limited ourselves to 2 races for comparison.

Screen Shot 2021-06-24 at 11.29.35

Building the Confusion Matrices and Calculating the Rates

def remove_race(df):

return df.drop(["African-American", "Asian", "Caucasian", "Hispanic", "Native American", "Other"], axis=1)

X_train_new = remove_race(X_train)

X_test_new = remove_race(X_test)

model = LogisticRegression(max_iter=1000), y_train)

plot_confusion_matrix(model, remove_race(caucasian), y_test[caucasian.index],, values_format='d')

plot_confusion_matrix(model, remove_race(african_american), y_test[african_american.index],, values_format='d')

model.score(X_test_new, y_test)


Caucasian data: False Positive Rate: 0.088 & False Negative Rate: 0.590

African American data: False Positive Rate: 0.237, False Negative Rate: 0.356

How to calculate rates for confusion matrices:

TP Rate: TP/(TP+FN)

FP Rate: FP/(FP+TN)

FN Rate: FN/(FN+TP)

TN Rate: TN/(TN + FP)


Analysis and Reflection of Results

Both the results from the various model predictions and the confusion matrices demontrate that there still remains a high level of bias in favor of the Caucasian defendant data in the machine learning algorithms. What is most compelling about the results and the results of the confusion matrices is that these results aren't actually surprising at all. The data we pulled into our algorithm comes from a corrupt system that we are attempting to fix. In that sense, the data from which we are teaching the model is flawed and since we are teaching and testing our model with that same data, it is unexpected that our model will somehow display different predictions. A new data set needs to be used, tThe source of this problem and merging it with artificial intelligence is that we generally believe AI represents a new age of process, an innovation that will change all aspects of society and the world as we know it. However, these results prove that in order to change the world with AI, we must first change the world in which we live.