California
Data Warehouse Project
The California Data Warehouse Project would
pull together data from disparate sources and formats and combine them
in a user-friendly interface. A prototype of the project is available at:
http://ssdc.ucsd.edu/cdw
What would these steps accomplish?
-
Integrate data about California from:
-
Federal, state, and local government sources,
-
non-profit and other freely-distributed data
sources,
-
any format (e.g., CD-ROM, floppy disk, tape,
Internet FTP, web sites, etc.)
-
Provide a web-based interface to these data
that would:
-
Meets the needs of a wide range of users (e.g.,
UC community, K-12, business, government, general public),
-
Allow users to easily locate data without
knowing its origin (user self-sufficiency),
-
Allow users to cross-tabulate data from different
sources,
-
Allow users to easily create maps, tables,
and graphs from the data.
-
Allow users to download data for analysis
with client software.
-
Provide seamless access to data, meta-data
and related text sources by integrating the Data Warehouse with the "Government
Information Project." This could include information about other data (e.g.
locally held CD-ROMs).
-
Guarantee long-term access.
What are the start-up and ongoing costs
(hardware, software, human resources) and timelines attached to each action?
| Phase |
Time |
Software |
Hardware |
Human resources |
| Development |
6 months |
Utilize existing
resources |
Utilize existing
resources |
-
.5 PA III
-
.25 project- coordinator*
-
librarian**
|
| Initial |
12 months |
Utilize existing
resources |
Utilize existing
resources |
-
.5 PA III
-
.25 project coordinator
-
librarian**
-
.5 clerical
|
| Production |
18 months |
SAS
Sybase
Infoseek |
Server $30K |
-
.25 PA III
-
.25 project-coordinator
-
librarian**
-
.25 outreach and training
|
*Project-coordinator responsible for
design, oversight, training, coordination.
** Collective participation throughout
the project by government information specialists and data specialists.
What campus costs will be avoided by taking
these actions?
-
Each campus could provide access to these
data without having to expend local resources (hardware, software, and
staff).
-
Economies of scale will allow greater access
at lower cost than duplicating access at each campus.
-
By warehousing publicly available data, campuses
avoid the cost of purchase or re-purchasing data from private vendors.
-
A consistent architecture for data delivery
facilitates better inter-campus collaboration for data access.
What issues must be resolved to take each
action? How can they be resolved?
GILS has already endorsed this project
and agreed to participate.
The major outstanding issue is the endorsement
of the project by CDL, SOPAG, and individual campuses, and the establishment
of a funding for staffing, hardware, and software.
What kinds of expertise are needed to take
these actions?
|
Expertise
|
Tasks
|
| Data |
Design, coordination,
and training and outreach, |
| Librarian/Government
Information Specialist |
Selection; user-support,
training, and outreach; coordination with agencies and campuses. |
| Programming |
Database, Statistical,
Web |
| Clerical |
Data entry, scanning,
etc. |
What kinds of ongoing assignments and
structures (e.g., collaborative structures) are needed for each action?
(i.e., how can these actions be accomplished?)
A Project Team, consisting of a data
archivists, government information specialists, and programmers would provide
long-term project oversight and coordination. The Project Team would be
backed up by govinfo@ucdavis (UC/Stanford Government Information Librarians
listserv) for feedback and input.