Enterprise Geocoding Workshop: Architecture and Issues
Download
Report
Transcript Enterprise Geocoding Workshop: Architecture and Issues
Enterprise Geocoding Workshop
Architecture and Issues
Craig Wolff, M.S. Eng
CA Environmental Health Tracking Program
Environmental Health Investigations Branch
CA Department of Health Services
Impact Assessment, Inc.
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
1
What is Enterprise Geocoding
• An address broker that extracts geographic
coordinates (lon/lat, region identifier) for multiple
users/applications across an enterprise
• An address broker provides address
standardization/verification, geocoding, and
region overlay
• Implicit is standards for geocoding and application
interoperability
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
2
Yeah? So who cares?
• Geocoding is almost always the first step to
linking environmental and health data
• Not all Tracking stakeholders have capacity
(expertise, data/application resources) or
interest/mandate to geocode
• Address and geocode quality can increase if
it’s done as close as possible to the time an
event is reported
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
3
The Transaction is Everything
• Enterprise Geocoding is handled by a unit of
interaction called a transaction
• Request to server, processing at server, response to
client
• A request can handle one or many addresses
• Server-side processing includes address
standardization, verification, geocoding versus
multiple street centerlines, and region overlay
• Response is the result of the processing
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
4
Web Services
• XML provides interoperability standard for
messaging over the Web
• SOAP provides an interface to methods that
use “serializable” objects. Client and server
implementations do not have platform
restrictions. (e.g. Microsoft talks to Java
talks to ESRI)
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
5
Serializable Objects
• Address – street (prefix, number, street name,
type, suffix, etc), zip, city, error (error codes from
CASS-certified standardizer/verifier)
• GeocodeOptions – Options for how you want
addresses geocoded in a session
• GeocodeRecord – Processed result of a geocoding
transacation
• RegionIDs – List of extracted regions for a single
geocode
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
6
GeocodeOptions
•
•
•
•
•
•
•
•
boolean doStreetID
boolean doStandardizedAddress
boolean doRegionID
boolean doZipAsZone
boolean doCityAsZone
boolean doFirstMatchingCoordOnly
boolean doMultiServiceErrorMetrics
boolean doResourceSpecificRegions
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
•
•
•
•
•
•
•
int sideOffset
int spellingSensitivity
int minimumMatchScore
int minimumCandidateScore
String [] streetResources
String [] standardizationResources
String [] regionResources
7
GeocodeRecord
•
•
•
•
•
•
•
•
•
•
String [] status (M/U/T)
short [] score (0-100)
String [] side (L or R)
double [] x
double [] y
String [] streetID
RegionIDs [] regionIDs
String [] metadataID
float [] averageError
Address standardizedAddress
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
8
SOAP Methods
• public void initializeGeocode (String user, String password)
• public void setGeocodeOptions (GeocodeOptions options)
• public GeocodeRecord findAddress (Address address)
• public GeocodeRecord [] findAddresses (Address [] addresses)
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
9
Technologies
• SQL Server and ArcSDE for Enterprise GIS
–
–
–
–
–
Storage of street centerline and region data
Geocoding engine
Application server for GIS operations
Java Client API for ArcSDE
http://arcsdeonline.esri.com
• ZP4
– CASS-certified address standardization
– C API
– http://www.semaphorecorp.com/cgi/zp4.html
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
10
More Technologies
• Apache Axis
– Java-based client/server web services tool
– Exposes Java methods and objects on server-side
– http://ws.apache.org/axis/
• Apache Tomcat
– J2EE application server
– Also runs ArcSDE Client API & Axis
– http://jakarta.apache.org/tomcat/index.html
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
11
Even More Technologies
• Java Topology Suite (JTS)
– ArcSDE Client API bug workaround
– More robust spatial analysis methods/objects
– http://www.vividsolutions.com/jts/jtshome.htm
• Visual Studio .NET, C#
– For creating ZP4 web service
– For creating web service clients
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
12
Building a City Geocoding Index
• Update street centerline attributes with soundex
(zip’s PO name) on left and right
• Build geocoding index on city soundex left/right;
note: ArcCatalog will overwrite any previous
indexes built for the same streets, see
http://forums.esri.com/Thread.asp?c=2&f=59&t=9
6397#271863 for creating a locator using a custom
template and command line interface
• Pass soundex(city name) from address table
• Never accept candidates who have tying score
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
13
Soundex
• Phonetic coding of a word; Geocoders use a 4 character
scheme
• Codes:
• First character in code is same as input
• Letters with codes of 0 are not included
• Words with less than 4 corresponding codes, receive
trailing zeros
• Examples
Poppy: P110
Santa Clara: S532
Oxford: O213
Main: M500
Santa Clarita: S532 Los Angeles: L252
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
14
Geocoding in Java
• Use ArcSDE Java Client API to communicate with
ArcSDE
• Use ArcSDE’s Server Side Application (SSA)
construct
• See http://forums.esri.com/Attachments/6591.pdf
for an example
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
15
Useful Patterns
• ArcSDE connections are notoriously slow to
initialize re-use connections from a Connection
Factory, and close connections after timeout
• Lots of data sources, server names, passwords, etc.
store this info in a database table, create an
object that encapsulates data resources; never
hardcode
• Use Axis/Tomcat sessions to minimize redundant
parameter passing
–
–
Server: ((HttpServletRequest)
MessageContext.getCurrentContext().getProperty(HTTPConstants.MC_HTTP_SERVLETREQUEST)).getSess
ion()
Client: setMaintainSession(true)
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
16
Client Implementations
• .NET thin desktop client
– Consumes centralized geocoding service and address
standardization service
– Input text, Access, or SQL Server table of addresses
– Requires Windows and .NET Framework
• Browser-based HTML thin client
– Better compatibility
– More effort in inputting addresses
– Easier to couple with environmental linkage services
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
17
Future Steps
• Tools developed thus far address automated
geocoding still need tools for interactive
geocoding on a map with orientation layers
– Many map services (some free) from USGS, Google,
Microsoft that layer vectors and imagery in basemap
• Need quicker geocoding engine (commercial
service? Centrus?)
• Need less cumbersome address standardization
service (USPS?)
cwolff@dhs.ca.gov http://ehib.org http://catracking.com
18