Tackling Pakistan’s Potholes Problem with AI and Automation

11 min readJust now

Project Overview (Third Semmester Project in Computer Engineering):

In Pakistan, potholes are a very persistant issue, especially in urban areas like Karachi, Lahore and Islamabad. Poorly maintained roads often contribute to increased travel times, lower body of cars getting damaged, and more often in serious cases accidents.

Addressing this challenge requires accurate detection and efficient complaint management systems. Our project goal is also sharing the same innovative idea that why not create a solution for people to report any potholes near there vicinity using a mobile application, automating complaint classification and resolution

This blog explores the project goals. Its features, also discussing some technical aspects of it and more specifically how AI and deep learning is making our project more effective for fake complaints

HOW THIS IDEA CAME:

I was travelling back home to Chakwal three months ago via motorway, and I was also seeing Google maps frequently. I saw that the live tracker of the google map didn’t suggested me to exit from Neela Dullah Interchange and from there to Chakwal, instead it was showing blue line for the next main Balkasar Interchange.

And as usual the exit of Balkasar interchange was nearly a mess, filled with potholes and the road was nearly broken like if it would have been from an earthquake.

So I thought why Google Map was not suggesting me the Neela Dullah Interchange exit as it was clear road and more over it would take less time to reach my home.
I reasearched on it, and found out that this system is made by engineers in a country where no road is broken. While in our third world country no road is clear with potholes.

After that I discussed with my Project Partner and we came to solution for solving a bit of this huge problem, like creating a pothole complaint system

Problem its Solving:

Potholes pose a major issue due to:

1. Delayed Complaint Handling: Municipal authorities rely on manual processes to categorize complaints.

2. Fake Complaints: Many complaints are irrelevant, wasting resources.

3. Scattered Data: A centralized database for tracking complaints is often missing.

This system automates the entire process: receiving complaints, verifying pothole images using AI, and categorizing genuine and fake reports.

Features of this Project:

· User-Friendly Android App

Captures potholes images

Automatically fetch GPS Location

Upload complaints directly to a local Spring Boot Server

· Backend Complaint Management

Spring Boot Handles complaint storage in a PostgreSQL database

Base64 Encoding Stores Uploaded Images

Python’s Flask service Verifies complaints using Azure’s Vision API

BEFORE IMPLEMENTING FAKE IMAGES DETECTION

AFTER IMPLEMENTING FAKE IMAGES DETECTION AND DELETING FEATURE

· Genuine Complaint Detection

AI indentifies potholes in uploaded images and marks complaints as “Genuine” or “Fake”. Like in above image, in later section we will discuss how AI and Deep Learning models work to detect images

· PostGreSQL Database Connection

All the reports (its JSON Payload Data) is saved in the database, along with unique ID and data

Understanding Project workflow through Flowchart

1. User Inputs via Mobile Application

Fetch Geolocation:
The application captures the user’s current location (latitude and longitude) using the device’s GPS feature.
Capture Image:
The user takes a picture of the pothole through the app’s camera interface.
Prepare JSON Payload:
The image is converted into a Base64-encoded string to ensure it can be transmitted efficiently.
The geolocation (address or coordinates) is also included in the payload.
Example JSON Payload:

{

“address”: “Taxila”,

“image”: “Encoded Base64 String”

}

2. Sending Payload to the Backend

Endpoint: (Spring server is running on port 5000)
The JSON payload is sent to a Spring Boot backend via the URL http://localhost:5000.
The backend receives and processes the payload to store and verify the complaint.

3. Backend Processing (Spring Boot)

Data Storage:
The received data is stored in a PostgreSQL database.
The image is saved in Base64 format along with the address details.
Integration with Python Flask API:
The encoded image is forwarded to a Python API for analysis.
The Spring Boot backend ensures seamless communication between the mobile app and the Python API.

4. Image Verification (Python Flask + Azure Vision API)

Image Decoding:
The Python Flask API decodes the Base64-encoded image and prepares it for processing.
Azure Vision API Integration: (This Flask Server is running on port 8080)
The image is sent to Azure Vision API for analysis.
The API checks whether the image contains a pothole using object detection and tagging.
If a pothole is detected with a confidence level above a defined threshold (e.g., 50%), the complaint is marked as “genuine.”
If no pothole is detected or confidence is low, the complaint is marked as “fake.”
Boolean Response:
The Azure Vision API returns a boolean (true or false) based on the presence of a pothole.

6. Display on the Web Application

Base64 Decoding for Display:
The Base64-encoded images are decoded to display properly on the frontend web application.
Frontend User Interface:
Users and administrators can view complaints categorized into “Genuine” and “Fake.”
Address and image details are displayed for each complaint.

Challenges and Obstacles Faced

· This project related content was very minimal on Internet, so it was very difficult to find solution when problem arrises

· Versions compatibility issues were very difficult to resolve, most classes and packages were outdated.

· Initially we made the whole app first then worked on building the server, but it created tons of issues, suggested to implement small details first and understand its working without jumping on big details

· Converting lat and long to address was a very big challenge, alone took 3 days

· Location wasn’t being shown on server, found out that request was sent earlier before address was converted resulting in null response.

· Establishing connection with Postgresql Database was crucial, couldn’t find its username and password correctly

· Initially used OpenCV to detect road, found out that it uses edge detection and detects using behavorial patterns, colors to detect roads, even considered patterns on my shirt as road LOL.

· Biggest obstacle that we tackled was of detection of road in images, took 1 week to understand how to implement python file in IntelliJ IDE, it ruined the normal working of server, so hosted the Azure API on 5000 port while server was running on 8080 port on same local server, then made and endpoint in server to pass image to 5000 port and understand the response

KEY THEORETICAL CONCEPTS EVERYONE SHOULD KNOW

Why do we use Port Forwarding?

Port Forwarding allows external devices or clients to comminicate with a specific device or server hosting any web application (private network) through a specific port. It maps an external port on your router to the internal IP and port of a device on your local network. Like we do in Socket Programming

We use it to access a server running on local device

For running web serves, databases, or APIs

Just like in a gated community with multiple houses. Each house has its own unique port or number, and whenever we talk about any specific port we refer to that house

Just like in this web server part, the spring boot server is running on 5000 port while the database in which it is connected is running on 5432 port

While Python API is running on 8080 port

What is JSON Payload, and why do we create it?

JavaScript Object Notation payload refers to the structered data sent from one application to another in a network request through the network.

We use it for Ease of Parsing

Better Data Interchange as its lightweight

Cross-Language support

How are images, videos or audio sent through a network and in how many ways?

Whenever we deal with multimedia transmission over a network whether it be an image, audio or video, the data needs to be encoded into a transferable format. Initially I was lacking this concepts.

Methods to Send Multimedia Data:

1. Base64 Encoding:

o Converts binary data into text format.

o Easily embedded into JSON, works well for APIs.

o Cons: Increases data size by ~33%.

o Example:
An image file gets converted to a Base64 string:
“data:image/png;base64,iVBORw0KGgoAAAANS…”

2. Binary/Byte Array:

o The multimedia file is sent as raw binary data (byte array).

o Pros: Smaller size than Base64.

o Cons: Needs specific handling by the receiving end.

3. File URLs or Cloud Storage Links:

o The media file is uploaded to a cloud server (e.g., AWS S3), and only its link is shared.

o Pros: Efficient and scalable.

o Cons: Requires internet access to download.

4. Chunking (Stream-Based Transfer):

o Media files are split into smaller chunks, streamed piece by piece.

o Useful for large files like videos or audio.

How does a Flask Server host Azure APIs?

We also need to use the Azure APIs to detect images, rather we can directly tranfer images to the web based models and it returns some result or either we can use Flask which acts as a Middleware to integrate Azure Services into your application using Keys and Endpoints.

Hosting Process is very simple:
Setup the Flask Server Setup

Flask runs on a specific port in my case 8080 and it listens for incoming requests

Calling Azure APIs

Flask receive request with data as images, We use request library in python to process and forward the data to API

Processing Azure Response
Azure API, analyzes the data and response from Azure is sent back to the Flask Server

THEN IT RETURNS US THE RESULT

Understanding how AI Models work with images and other data and produce results

Understanding how these large scale models work is always a critical part to work upon. After all all these Powerful models whether object detection, or LLMs are a culmination of decades of research in AI. I’ll try my best approach to explain every basic detail to make your concepts clear about these models

How basic Classification Machine Learning Models work

Let’s assume I have 20 songs, and I want to classify out of these songs that which one is classical (qawali type) and which one is pop type song. Normally what we do that we create a simple ML model in python, train and test it. But how this model detects internally about the songs

So we usually know that we can make two metrics for these songs like energy and length normally what happens is that
Pop songs have high Energy but length is small

While Classical songs have low Energy but length is large

So what we will do is that make a graph and point the songs according to these orderred pair form on the graph.

KEEP THIS IN NOTICE THAT ML IS BASICALLY A COMPUTER AND COMPUTER ONLY UNDERSTANDS NUMERIC INPUT

When we visualize the data we can easily see that high energy , high temp songs are primarly pop and lower temp songs are mostly classical which makes sense

So by labeling each by hand and offcourse in a huge dataset takes time, so instead we can learn the relationship between song metrics and genre and then make predictions

NOTICE THAT THIS WAS A CLASSIFICATION PROBLEM, BECAUSE THERE IS ONE OUTCOME FROM MUTIPLE SET OF INPUT CLASSES.

What if the song input was a mashup like it includes mixture of songs, so in that case, either we use Cluster Approach of KNN, and predict the outcome or we normally solve in variable type of graphs like this above example was linear, which was quite easy but usually models work far more complex that this.

What if the input is an image like in our project

Lets make this interesting, assume you have a herd of sheep to protect, you have to create a Machine Learning solution that detects the intruders and sounds the alarm in that only case when we see a tiger but not if we see a cat or fox.

Now understanding the behind scenes is that, our new problem is now our input is of an image

ALREADY KNOW THIS IS ALSO A CLASSIFICATION PROBLEM, OUR OUTPUT IS ONE OUT OF MANY INPUTS LIKE IN A MULTIPLEXER

But how computer detects the image in numeric inputs, our above example was totally numerical input like ordered pair form, but what now.

KEEP IN MIND IMAGES ARE ALSO NUMERIC TOO.

Assume I gave you 150000 pixels of an image, with each having a certain, height, width, and RGB Value, and I tell you to tell me what this image is, Definitely you can’t give the result, but Computers are likely to get this input to process and generate outputs

We can directly feed the pixels into ML model ignoring any spatial element here. So the problem occurs that how to generally distinguish among animals. So the model needs to be learning from scratch the mapping or any relationship between those raw pixels and image labels

How Azure Vision AI Detect Objects in Images

Image Preprocessing:
When an image is uploaded , AI preprosses the data, the image is broken into numerical matrices where each pixel is represented by its RGB value. Like a 150 x 150 pixel image has 22500 data points (150x150x3 for RGB)

Image is then resized so that data image fits into the expected input of the Model

Feature Extraction:

In deep learning terms which is a subset of ML, we talk about Convolutional Neural Networks (CNNs). Azure AI uses CNNs to extract features from images, like edges, textures, patterns

FOR EXAMPLE: if detecting a tiger, features like stripes, fur texture, body shape are identified. And further processed

Model Training and Object Detection:

There are pre trained models in azure already equipped with knowledge of numerous data (cats, tigers dogs etc). The model classifies the extracted features from CNNs approach and start comparing them with its learned patterns, to calculate probabilities in classes