Via geothought comes word of the latest project from the developer of OpenHeatMap, Pete Warden. The Data Science Toolkit wraps up a number of open-source data tools into a combined server/data package that you can install as a VMware image or Amazon EC2 package (downloads/instructions included). You can also set it up as a stand-alone Linux-based server, or install it on a hosting service, with instructions available at the project’s github depository. Geo-related services include:
- Geocoding: US address to latitude/longitude)
- Coordinates to political areas: Enter coordinates, and get country/state/region/neighborhood data. For example, for (37.769456,-122.429128), you get:
(United States, usa, country) (California, us06, state) (San Francisco, 06_075, county) (San Francisco, 06_67000, city) (Eighth district, CA, 06_08, constituency) (Castro-Upper Market, Castro-Upper Market|San Francisco|CA, neighborhood)
- Geodict: “pulls country, city and region names from unstructured English text, and returns their coordinates.” This one’s not working fully yet; sometimes gives results, other times nothing, even on the same query.
- IP Address to Coordinates: Translates IPv4 numeric address to coordinates. Not always the coordinates directly associated with the website. For example, this site is based out of Arizona, but the server address for the hosting company is in Chicago, and used to be in Utah.
Also includes a built-in REST/JSON-based API for web services, so you can invoke it from other websites. Big advantage of this approach is that you can set up your own server for these data queries, free of the daily limits other similar services apply. You can test out the current services on the website.
Here’s a talk on this topic by Pete Warden from this year’s GigaOm Structure Big Data 2011 conference: