Tackling Pakistan’s Potholes Problem with AI and Automation
Project Overview (Third Semmester Project in Computer Engineering):
In Pakistan, potholes are a very persistant issue, especially in urban areas like Karachi, Lahore and Islamabad. Poorly maintained roads often contribute to increased travel times, lower body of cars getting damaged, and more often in serious cases accidents.
Addressing this challenge requires accurate detection and efficient complaint management systems. Our project goal is also sharing the same innovative idea that why not create a solution for people to report any potholes near there vicinity using a mobile application, automating complaint classification and resolution
This blog explores the project goals. Its features, also discussing some technical aspects of it and more specifically how AI and deep learning is making our project more effective for fake complaints
HOW THIS IDEA CAME:
I was travelling back home to Chakwal three months ago via motorway, and I was also seeing Google maps frequently. I saw that the live tracker of the google map didn’t suggested me to exit from Neela Dullah Interchange and from there to Chakwal, instead it was showing blue line for the next main Balkasar Interchange.
And as usual the exit of Balkasar interchange was nearly a mess, filled with potholes and the road was nearly broken like if it would have been from an earthquake.
So I thought why Google Map was not suggesting me the Neela Dullah Interchange exit as it was clear road and more over it would take less time to reach my home.
I reasearched on it, and found out that this system is made by engineers in a country where no road is broken. While in our third world country no road is clear with potholes.
After that I discussed with my Project Partner and we came to solution for solving a bit of this huge problem, like creating a pothole complaint system
Problem its Solving:
Potholes pose a major issue due to:
1. Delayed Complaint Handling: Municipal authorities rely on manual processes to categorize complaints.
2. Fake Complaints: Many complaints are irrelevant, wasting resources.
3. Scattered Data: A centralized database for tracking complaints is often missing.
This system automates the entire process: receiving complaints, verifying pothole images using AI, and categorizing genuine and fake reports.
Features of this Project:
· User-Friendly Android App
Captures potholes images
Automatically fetch GPS Location
Upload complaints directly to a local Spring Boot Server
· Backend Complaint Management
Spring Boot Handles complaint storage in a PostgreSQL database
Base64 Encoding Stores Uploaded Images
Python’s Flask service Verifies complaints using Azure’s Vision API
BEFORE IMPLEMENTING FAKE IMAGES DETECTION
AFTER IMPLEMENTING FAKE IMAGES DETECTION AND DELETING FEATURE
· Genuine Complaint Detection
AI indentifies potholes in uploaded images and marks complaints as “Genuine” or “Fake”. Like in above image, in later section we will discuss how AI and Deep Learning models work to detect images
· PostGreSQL Database Connection
All the reports (its JSON Payload Data) is saved in the database, along with unique ID and data
Understanding Project workflow through Flowchart
1. User Inputs via Mobile Application
- Fetch Geolocation:
The application captures the user’s current location (latitude and longitude) using the device’s GPS feature. - Capture Image:
The user takes a picture of the pothole through the app’s camera interface. - Prepare JSON Payload:
- The image is converted into a Base64-encoded string to ensure it can be transmitted efficiently.
- The geolocation (address or coordinates) is also included in the payload.
- Example JSON Payload:
{
“address”: “Taxila”,
“image”: “Encoded Base64 String”
}
2. Sending Payload to the Backend
- Endpoint: (Spring server is running on port 5000)
The JSON payload is sent to a Spring Boot backend via the URL http://localhost:5000. - The backend receives and processes the payload to store and verify the complaint.
3. Backend Processing (Spring Boot)
- Data Storage:
- The received data is stored in a PostgreSQL database.
- The image is saved in Base64 format along with the address details.
- Integration with Python Flask API:
- The encoded image is forwarded to a Python API for analysis.
- The Spring Boot backend ensures seamless communication between the mobile app and the Python API.
4. Image Verification (Python Flask + Azure Vision API)
- Image Decoding:
The Python Flask API decodes the Base64-encoded image and prepares it for processing. - Azure Vision API Integration: (This Flask Server is running on port 8080)
- The image is sent to Azure Vision API for analysis.
- The API checks whether the image contains a pothole using object detection and tagging.
- If a pothole is detected with a confidence level above a defined threshold (e.g., 50%), the complaint is marked as “genuine.”
- If no pothole is detected or confidence is low, the complaint is marked as “fake.”
- Boolean Response:
The Azure Vision API returns a boolean (true or false) based on the presence of a pothole.
6. Display on the Web Application
- Base64 Decoding for Display:
The Base64-encoded images are decoded to display properly on the frontend web application. - Frontend User Interface:
- Users and administrators can view complaints categorized into “Genuine” and “Fake.”
- Address and image details are displayed for each complaint.
Challenges and Obstacles Faced
· This project related content was very minimal on Internet, so it was very difficult to find solution when problem arrises
· Versions compatibility issues were very difficult to resolve, most classes and packages were outdated.
· Initially we made the whole app first then worked on building the server, but it created tons of issues, suggested to implement small details first and understand its working without jumping on big details
· Converting lat and long to address was a very big challenge, alone took 3 days
· Location wasn’t being shown on server, found out that request was sent earlier before address was converted resulting in null response.
· Establishing connection with Postgresql Database was crucial, couldn’t find its username and password correctly
· Initially used OpenCV to detect road, found out that it uses edge detection and detects using behavorial patterns, colors to detect roads, even considered patterns on my shirt as road LOL.
· Biggest obstacle that we tackled was of detection of road in images, took 1 week to understand how to implement python file in IntelliJ IDE, it ruined the normal working of server, so hosted the Azure API on 5000 port while server was running on 8080 port on same local server, then made and endpoint in server to pass image to 5000 port and understand the response
KEY THEORETICAL CONCEPTS EVERYONE SHOULD KNOW
Why do we use Port Forwarding?
Port Forwarding allows external devices or clients to comminicate with a specific device or server hosting any web application (private network) through a specific port. It maps an external port on your router to the internal IP and port of a device on your local network. Like we do in Socket Programming
We use it to access a server running on local device
For running web serves, databases, or APIs
Just like in a gated community with multiple houses. Each house has its own unique port or number, and whenever we talk about any specific port we refer to that house
What is JSON Payload, and why do we create it?
JavaScript Object Notation payload refers to the structered data sent from one application to another in a network request through the network.
We use it for Ease of Parsing
Better Data Interchange as its lightweight
Cross-Language support
How are images, videos or audio sent through a network and in how many ways?
Whenever we deal with multimedia transmission over a network whether it be an image, audio or video, the data needs to be encoded into a transferable format. Initially I was lacking this concepts.
Methods to Send Multimedia Data:
1. Base64 Encoding:
o Converts binary data into text format.
o Easily embedded into JSON, works well for APIs.
o Cons: Increases data size by ~33%.
o Example:
An image file gets converted to a Base64 string:
“data:image/png;base64,iVBORw0KGgoAAAANS…”
2. Binary/Byte Array:
o The multimedia file is sent as raw binary data (byte array).
o Pros: Smaller size than Base64.
o Cons: Needs specific handling by the receiving end.
3. File URLs or Cloud Storage Links:
o The media file is uploaded to a cloud server (e.g., AWS S3), and only its link is shared.
o Pros: Efficient and scalable.
o Cons: Requires internet access to download.
4. Chunking (Stream-Based Transfer):
o Media files are split into smaller chunks, streamed piece by piece.
o Useful for large files like videos or audio.
How does a Flask Server host Azure APIs?
We also need to use the Azure APIs to detect images, rather we can directly tranfer images to the web based models and it returns some result or either we can use Flask which acts as a Middleware to integrate Azure Services into your application using Keys and Endpoints.
Hosting Process is very simple:
Setup the Flask Server Setup
Flask runs on a specific port in my case 8080 and it listens for incoming requests
Calling Azure APIs
Flask receive request with data as images, We use request library in python to process and forward the data to API
Processing Azure Response
Azure API, analyzes the data and response from Azure is sent back to the Flask Server
THEN IT RETURNS US THE RESULT
Understanding how AI Models work with images and other data and produce results
Understanding how these large scale models work is always a critical part to work upon. After all all these Powerful models whether object detection, or LLMs are a culmination of decades of research in AI. I’ll try my best approach to explain every basic detail to make your concepts clear about these models
How basic Classification Machine Learning Models work
Let’s assume I have 20 songs, and I want to classify out of these songs that which one is classical (qawali type) and which one is pop type song. Normally what we do that we create a simple ML model in python, train and test it. But how this model detects internally about the songs
So we usually know that we can make two metrics for these songs like energy and length normally what happens is that
Pop songs have high Energy but length is small
While Classical songs have low Energy but length is large
So what we will do is that make a graph and point the songs according to these orderred pair form on the graph.
KEEP THIS IN NOTICE THAT ML IS BASICALLY A COMPUTER AND COMPUTER ONLY UNDERSTANDS NUMERIC INPUT
When we visualize the data we can easily see that high energy , high temp songs are primarly pop and lower temp songs are mostly classical which makes sense
So by labeling each by hand and offcourse in a huge dataset takes time, so instead we can learn the relationship between song metrics and genre and then make predictions
NOTICE THAT THIS WAS A CLASSIFICATION PROBLEM, BECAUSE THERE IS ONE OUTCOME FROM MUTIPLE SET OF INPUT CLASSES.
What if the song input was a mashup like it includes mixture of songs, so in that case, either we use Cluster Approach of KNN, and predict the outcome or we normally solve in variable type of graphs like this above example was linear, which was quite easy but usually models work far more complex that this.
What if the input is an image like in our project
Lets make this interesting, assume you have a herd of sheep to protect, you have to create a Machine Learning solution that detects the intruders and sounds the alarm in that only case when we see a tiger but not if we see a cat or fox.
Now understanding the behind scenes is that, our new problem is now our input is of an image
ALREADY KNOW THIS IS ALSO A CLASSIFICATION PROBLEM, OUR OUTPUT IS ONE OUT OF MANY INPUTS LIKE IN A MULTIPLEXER
But how computer detects the image in numeric inputs, our above example was totally numerical input like ordered pair form, but what now.
KEEP IN MIND IMAGES ARE ALSO NUMERIC TOO.
Assume I gave you 150000 pixels of an image, with each having a certain, height, width, and RGB Value, and I tell you to tell me what this image is, Definitely you can’t give the result, but Computers are likely to get this input to process and generate outputs
We can directly feed the pixels into ML model ignoring any spatial element here. So the problem occurs that how to generally distinguish among animals. So the model needs to be learning from scratch the mapping or any relationship between those raw pixels and image labels
How Azure Vision AI Detect Objects in Images
- Image Preprocessing:
When an image is uploaded , AI preprosses the data, the image is broken into numerical matrices where each pixel is represented by its RGB value. Like a 150 x 150 pixel image has 22500 data points (150x150x3 for RGB)
Image is then resized so that data image fits into the expected input of the Model
- Feature Extraction:
In deep learning terms which is a subset of ML, we talk about Convolutional Neural Networks (CNNs). Azure AI uses CNNs to extract features from images, like edges, textures, patterns
FOR EXAMPLE: if detecting a tiger, features like stripes, fur texture, body shape are identified. And further processed
- Model Training and Object Detection:
There are pre trained models in azure already equipped with knowledge of numerous data (cats, tigers dogs etc). The model classifies the extracted features from CNNs approach and start comparing them with its learned patterns, to calculate probabilities in classes
- Bounding Box and Object Localization:
If considering object detection than AI identifies the location of objects in the image. This involves drawing bounding boxes
Like the system identifies a tiger by marking its postion in the image and giving it a probability score
- Decision Making:
Based on the detection, we define our actions like what to do against that our was returning boolean to segregate fake images from correct ones