Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development Data Group and International Household Survey Network odupriez@worldbank.org.
Download
Report
Transcript Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development Data Group and International Household Survey Network odupriez@worldbank.org.
Archiving microdata
Standards and good practices
United Nations Statistics Commission
New York, February 26, 2009
Olivier Dupriez
World Bank, Development Data Group
and
International Household Survey Network
odupriez@worldbank.org
The value of data
• Survey and censuses
– High cost ! High value ?
• Data have value beyond the purpose for
which they were originally collected
(“repurposing” of data)
– Large under-exploited potential
• Condition: proper archiving
– Documentation, dissemination, preservation
Data archiving – Two models
By a specialized data center
(“trusted repository”)
By the data producer
(Most developing countries)
(US, Canada, Europe)
•
•
•
•
Often academic
High level of expertise
Infrastructure
Standards and best practices
for documentation
• Formal dissemination and
preservation policies and
procedures
• Support to users
• Not seen as a key role
• Lack of expertise
• Inappropriate
infrastructure
• Ad hoc practices
• No compliance with
international standards
• Unclear policies and
procedures
Sharing good practices
Objective: transfer data archiving good practices
and standards to data producers
International Household Survey Network (IHSN)
– A network of international agencies (coordinated by
World Bank /PARIS21)
– Develop tools, guidelines, training materials
– Advocates compliance with good practices
and international standards
www.ihsn.org
Microdata documentation
Good documentation is needed to:
–
–
–
–
Properly analyze the data
Increase credibility of derived indicators and analysis
Allow replication of data collection or analysis
Build institutional memory
DDI + Dublin Core metadata standards (XML)
A checklist of everything you need to know
– Study description
– File description
– Variable description
– Related materials
www.ddialliance.org
IHSN DDI Metadata Editor
Documenting the study: sampling, data
collection, scope and coverage, etc.
IHSN DDI Metadata Editor
Documenting files and variables: formulation
of question, interviewer’s instructions,
computation of variables, etc.
IHSN DDI Metadata Editor
Metadata in XML format …
… can be “transformed”
into html, pdf, other
Microdata cataloguing
XML/DDI metadata is web-ready, “browsable and searchable”
Microdata dissemination
• Growing demand for microdata
• Potential to add much value to existing data
• But requires:
– Enabling legislation
– Formal policy/procedures (IHSN guidelines)
– Technical capacity to prepare data for dissemination
• Documenting, cataloguing
• Anonymizing (IHSN tools being tested)
Data and metadata preservation
Situation in many countries: documents in hard copy only,
outdated storage media, multiple versions of datasets, much
information lost (or never generated).
Goal: Data and documentation remain readable, meaningful,
understandable, accessible manage hardware, software
and storage media (not only backups; also “migration”)
On-going: IHSN-ICPSR guidelines (Open Archival Information System OAIS; ISO 14721)
Conclusions and recommendations
– NSOs do not need to have all features of
advanced data centers, but data archive is part
of their mandate
– Documentation and preservation are a MUST,
even if you don’t disseminate
– Good practices and standards are relatively
easy to implement
– Good documentation of past surveys helps
improve the quality of future surveys