Geodata

We looking for new a maintainer for AndyZ's peerreviewer script, see WT:Peer review for details.
The geographical coordinate tools are all currently experimental at the moment.

Access

Dump are available upon request, this maybe automated in the future if there's demand. Toolserver user can access these database by connecting to the server for their respective wikis. phpMyAdmin will helps introduce users to layout of the tables.

SELECT page_id, gc_lat, gc_lon, gc_region, page_title
FROM u_dispenser_p.coord_enwiki
JOIN page ON page_id = gc_from
WHERE page_namespace=0
AND gc_from NOT IN (SELECT DISTINCT il_from from imagelinks)
limit 100;
The above query yields the first 100 pages have geographical coordinate but lack images, optimization and more sophisticated image analysis is left as an exercise for the reader.

Known usage

locateCoord.py

locateCoord.py very simple and quickly code tool to give uses the ability to retrieve the data from the database. The this query will give coordinates that are 5 km near the center of New York City. Eventually a rewrite will be needed with support for JSON/XML/YMAL/etc. with better option support as the current arcutecture is limited.
locateCoord.py source code

geosearch.py

geosearch.py is a simple tool to assist in tracking down pages from which common error or inapporate data is entered into. Other languages/database is suport with the paramter &site=languageprefix (ex: commons, de, fr, ...). More Examples
geosearch.py source code

iwccoord.py

iwcoord.py find possible coordinates that can be copied from one language to another, doesn't actually use ghel or the database.

iwcoord.py source code

regioncheck.py

regioncheck.py produces reports using Administrative Boundaries - First Level (ESRI) dataset retrieves all state boundary polygons and find the shortest distance to each one. If the point is found inside it skips it a moves to the next point. This way it gives the shortest distance to all points outside of the country.

Wikipedia-World

It is reported data is being used from here on the project project

Logs

Error and warning outputted from the tool are available at http://toolserver.org/~dispenser/logs/. Error are items ghel could not parse, while warning are things it could parse but should be corrected for other programs to read correctly.

Things left to do

  • Develop an API capable of writing out in HTML, JSON, serialized PHP, KML, OSM, and XML.
  • Language independent article ranking table (length, incoming links, interwiki links)
  • Reset primary bit for multiple primary coordinates form the same article
  • WikiMiniAltas/OSM data integration under heavy load without killing the databases.
  • Reimplemented features into GeoHack.
  • Documentation, source code should be documented so a novice could understand it.
  • Live updating, MySQL triggers functionality is required for this.

Fields

This section is rough draft of definitions

gc_from
Article ID
gc_lat
latitude
gc_lon
Longitude
gc_alt
Elevation in meters above the sea level
gc_head
The direction in degree from north (if applicable)
gc_dim
The rough size of the object
gc_type
WP:GEO/#Type:
gc_size
City population size
gc_globe
Which body are the coordinates on (NOTE get standards for other bodies)
gc_primary
Where the coordinate represents the primary object in the Photo or article (TODO word this better)
gc_name
The Name of the object, if none is given then the article title will be used
gc_location
MBR point binary

Schema summary

mysql> describe coord_dewiki;
+-------------+----------------------------------------------------------+------+-----+---------+-------+
| Field       | Type                                                     | Null | Key | Default | Extra |
+-------------+----------------------------------------------------------+------+-----+---------+-------+
| gc_from     | int(8) unsigned                                          | NO   | MUL | NULL    |       |
| gc_lat      | float                                                    | NO   |     | NULL    |       |
| gc_lon      | float                                                    | NO   |     | NULL    |       |
| gc_alt      | float                                                    | YES  |     | NULL    |       |
| gc_head     | float                                                    | YES  |     | NULL    |       |
| gc_dim      | float unsigned                                           | YES  |     | NULL    |       |
| gc_type     | varchar(63)                                              | YES  |     | NULL    |       |
| gc_size     | float                                                    | YES  |     | NULL    |       |
| gc_region   | varchar(127)                                             | YES  |     | NULL    |       |
| gc_globe    | enum('','mercury','venus','earth','moon','mars','ceres') | YES  |     | earth   |       |
| gc_primary  | tinyint(1)                                               | NO   |     | 0       |       |
| gc_name     | varchar(255)                                             | NO   |     | NULL    |       |
| gc_location | point                                                    | NO   | MUL | NULL    |       |
+-------------+----------------------------------------------------------+------+-----+---------+-------+
13 rows in set

Dumps

The database is dump weekly and is accessible from http://toolserver.org/~dispenser/dumps/ as compressed sql dumps. Dumping is schedule for Thursdays at 9:40 UTC.

Source code

  • geodbcompiler.py - Simple application to create and fill the database with the geographic data
  • ghel.py - GeoHack External Link parsing library
Interaction