View all files | ||||
MapTab is a comprehensive benchmark designed to evaluate the map understanding and spatial reasoning capabilities of Vision-Language Models (VLMs). The benchmark focuses on two core tasks: route planning and map-based question answering, using both metro maps and travel maps.
Project Page: https://ziqiao-shang.github.io/MapTab-Leaderboard/
MapTab evaluates VLMs on their ability to:
The benchmark includes various route planning subtasks with different input modalities and constraint levels:
| shortest_path_only_map | Map Image | Route planning using only visual map |
| shortest_path_only_tab | Table (JSON) | Route planning using only tabular data |
| shortest_path_only_csv | Table (CSV) | Route planning using only CSV data (ablation) |
| shortest_path_map_and_tab_no_constraint | Map + Table | Combined input without constraints |
| shortest_path_map_and_csv | Map + CSV | Combined with CSV format (ablation) |
| shortest_path_map_and_tab_with_constraint_1 | Map + Table | With constraint type 1 |
| shortest_path_map_and_tab_with_constraint_2 | Map + Table | With constraint type 2 |
| shortest_path_map_and_tab_with_constraint_3 | Map + Table | With constraint type 3 |
| shortest_path_map_and_tab_with_constraint_4 | Map + Table | With constraint type 4 |
| shortest_path_map_and_tab_with_constraint_1_2_3_4 | Map + Table | With all four constraints |
| shortest_path_map_and_tab_with_constraint_1_2_4 | Map + Table | With constraints 1, 2, and 4 |
| shortest_path_map_and_tab_with_constraint_1_3_4 | Map + Table | With constraints 1, 3, and 4 |
| shortest_path_map_and_tab_with_constraint_2_3_4 | Map + Table | With constraints 2, 3, and 4 |
| only_vertex2 | Map + Table | Special vertex subset task |
| shortest_path_csv_vertex2 | Map + CSV | CSV format with vertex subset (ablation) |
| shortest_path_map_and_tab_csv_constraint_1_2_3_4 | Map + CSV | CSV format with all constraints (ablation) |
QA tasks evaluate map comprehension across different aspects:
| 1 | 1_qa_only_pic_global | Global questions using only map image |
| 2 | 2_qa_only_pic_part | Local/partial questions using only map image |
| 3 | 3_qa_only_pic_spatial_judge | Spatial judgment using only map image |
| 4 | 4_qa_edge_tab_global | Global edge questions with table |
| 5 | 5_qa_edge_tab_part | Local edge questions with table |
| 6 | 6_qa_edge_tab_spatial_judge | Spatial edge judgment with table |
| 7 | 7_qa_vertex_tab_global | Global vertex questions with table |
| 8 | 8_qa_vertex_tab_part | Local vertex questions with table |
| 9 | 9_qa_vertex_tab_spatial_judge | Spatial vertex judgment with table |
| 10 | 10_qa_pic_and_tab_global | Global questions with map and table |
| 11 | 11_qa_pic_and_tab_part | Local questions with map and table |
| 12 | 12_qa_pic_and_tab_spatial_judge | Spatial judgment with map and table |
The dataset includes two map types:
The current release includes the following files under both metromap/ and travelmap/:
Note: In the current release, only the RP task test query set is available. QA task queries and RP task training queries will be released in future updates.
Files in these five folders (data/, qa_data/, images/, prompts/, tabulars/) can be downloaded from Hugging Face: https://huggingface.co/datasets/szq-nju/MapTab
The framework has been tested with the following model identifiers in src/generate_lib/utils.py:
Note: For API-based models, we only provide one Aliyun Bailian integration example in src/generate_lib/qwen_api.py. Since API platforms vary, please implement other provider interfaces by following this example.
| all_acc | Exact match accuracy (complete route correctness) |
| part_acc | Partial accuracy (proportion of correct route segments) |
| Difficulty_score | Difficulty-weighted score based on map and query complexity |
| accuracy | Proportion of correct numeric answers |
If you use MapTab in your research, please cite: