Skip to main content

Grab builds smarter maps for Southeast Asia with vision fine-tuning

This image features the logo of Grab, a Southeast Asian technology company known for its super app that provides ride-hailing, food delivery, and financial services. The logo is displayed against a green and yellow abstract background.

Grab(opens in a new window) is a leading food delivery and rideshare company in Southeast Asia, servicing almost 42 million monthly users across eight countries.

With more than 6 million driver-partners registered on the platform and 3.5 billion annual transactions in 2023, Grab’s impact extends beyond ride-hailing and food delivery. Grab turns street-level imagery collected from their drivers into mapping data used to power GrabMaps⁠, a mapping service enabling all of their Southeast Asia operations. 

GPT-4o vision fine-tuning enables the service to correctly localize traffic signs and count lane dividers to refine the mapping data. Today, GrabMaps not only supports Grab’s services but also powers enterprises with its location intelligence capabilities.

Mapping Southeast Asia for better mobility

Southeast Asia presents a uniquely challenging environment for mapping. Its road networks include narrow, one-way streets optimized for motorbikes and pedestrians, rapidly changing urban landscapes, and limited coverage from conventional mapping providers. 

“To meet the needs of the region, we had to build something hyperlocal and dynamic—mapping Southeast Asia as it evolves.”
Adrian Margin, Head of Data Science for Geo Mapping at Grab

Grab turned to OpenAI’s GPT-4o with vision fine-tuning to overcome these obstacles. 

By using its network of motorbike drivers and pedestrian partners, each equipped with 360-degree cameras, GrabMaps collected millions of street-level images to train and fine-tune models for detailed mapmaking. 

GPT-4o’s vision fine-tuning capabilities allowed GrabMaps to localize speed limit signs, turn restrictions, places, and road geometries more accurately.

Using vision fine-tuning to automate mapmaking

Grab’s initial experiments focused on matching speed limit signs to their corresponding roads.

The team fine-tuned GPT-4o using only 100 sample cases, combining street-level imagery and map tiles, and iterating through various hyperparameter adjustments to enhance accuracy.

Starting with a baseline accuracy of 67%, the model improved to 80% after two rounds of fine-tuning—a 13-percentage point gain.

The model excelled in handling complex scenarios such as elevated roads and occlusions, which previously required significant manual intervention. By cross-referencing street imagery with map tiles, the model made context-aware decisions akin to human operators.

“Fine-tuning GPT-4o with our data enabled us to handle complex geometries effectively, reducing manual interventions and operational costs,” says Adrian Margin, Head of Data Science for Geo Mapping at Grab.

Fine-Tuning Iterations
67%72%+5 points80%+13 points65%70%75%80%AccuracyIteration 0: Non-fine-tuned with Chain of Thought (CoT)Iteration 1: Vision fine-tuningIteration 2: Vision fine-tuning + hyperparametersIterations with GPT-4o

Reducing costs and driving trust in data

Implementing vision fine-tuning significantly boosted GrabMaps' efficiency and accuracy:

  • Lane count accuracy increased by 20%

  • Speed limit sign localization improved by 13%

  • Reduced manual mapping efforts, cutting operational costs and improving trust in data quality

  • Enhanced ability to address challenging scenarios like elevated signs and occlusions, leading to fewer errors in map outputs

These improvements translate to a more reliable platform for Grab’s internal operations and enterprise customers. Hyper-detailed maps now better serve the needs of millions of users and driver-partners every day, empowering economic activity across the region.

“GrabMaps is not just a tool for us—it’s a reflection of our commitment to Southeast Asia. OpenAI’s vision fine-tuning made our mapmaking process faster, smarter, and more cost-effective,” adds Margin.

Expanding to greater accessibility and responsiveness

Grab continues to expand its AI capabilities to make its platform more accessible and responsive. A voice assistant offering conversational, multilingual support for visually impaired and elderly users is in development to make navigating the app easier for everyone.

Grab is also building an advanced support chatbot to handle complex inquiries. By understanding detailed standard operating procedures (SOPs) and delivering empathetic, tailored responses, the chatbot aims to improve user experience while scaling efficiently.

“We’ve been a pioneer of AI adoption in the region, and believe that AI has a lot of potential to further transform the way we solve problems for our partners and users,” says Philipp Kandal, Chief Product Officer at Grab. “We’re excited to work with OpenAI as a partner to help accelerate the exploration and use of this technology.”