The live feed info is available for the 3rd meeting of the LC Working Group on the Future of Bibliographic Control at
http://www.loc.gov/today/cyberlc/live.
I found out via the CC:DA email list, which btw, should soon be viewable to the public. I made the comment that it's too bad that CC:DA meetings couldn't be done that way and save the interns the work of compiling the minutes. As a former intern, I'm well aware of the work involved in ensuring accurate records.
Can CC:DA podcast meetings and accept the video as "the record" and leave the interns to summarize the content and outcomes of the meetings for distribution channels?
I'd like to investigate tools, costs, learning curve, etc. I know the Code4Libbers have podcasts going, I know LITA has podcasts going (I have not had the time, unfortunately, to view/listen to those podcasts). The idea is within the realm of possibility. What I don't know is the socio-political implications of making such a suggestion within the structures of ALA/ALCTS. I confess. I haven't read the ALA Handbook. I've treated it more like a reference work that I review on a need-to-know basis.
It's something that the CC:DA Task Force on Internal and External communication has to investigate. I would love to hear others thoughts on using social software tools to better do the work of CC:DA. Would people view podcasts of CC:DA meetings? Caveat emptor: they are lengthy.
Labels: CC:DA
Haworth recently announced the new
Journal of Library Metadata.
I feel a bit irritated every time a new LIS journal arrives on the scene which doesn't let authors retain full copyright. Haworth, to it's credit, is a
SHERPA/RoMEO "green publisher," meaning that authors can archive pre-prints and post-prints of the work if they meet certain conditions. In the case of Haworth, those conditions include: the archiving must be on the author's website or author's institutional web site,there should be notice of the publisher's copyright and citation pointing readers to the published version of the article, and the server upon which pre/post print is archived must be non-profit.
Sounds OK. Articles from this new journal will be available, in some form, as Open Access so why am I irritated? I'm not fond of
Haworth's copyright transfer agreement. Authors transfer full copyright to Haworth and retain limited rights of re-use rather than authors retaining their copyright and licensing publication privileges to Haworth.
As a long term strategy, it's not optimal. Authors don't need to sign away full rights to publishers and they shouldn't. It's a nit-picky thing for me. Publishers need permission to make the article available, to archive it/re-purpose to different formats when necessary, etc. It's great that the publishers allow authors to retain rights. I just think that in the very long term, it's not a good practice to let the publishers have it all just because they let authors keep a manifestation of a work on their own server to do what they will.
It's really a question of how much one trusts publishers to share any profit they may make from your work in the long term.
At least the individual subscription price for the
Journal of Library Metadata is a reasonable $48. I still chafe at any type of reader fee for metadata research, given the interoperability issues that face the metadata community. Less affluent libraries should be able to access the research up-front without relying on the individual vagaries of personal archiving practice. Just because Haworth allows authors to archive their articles, doesn't mean that those authors will archive those articles.
Under currant practice, the only guaranteed, timely, access to the "published" work is via the journal. When there are barriers to that journal, it doesn't serve the LIS community. It's not an easy black/white issue. It does cost money to review and produce the final article and the journal publishers are providing a service. Somebody needs to foot the bill.
We cannot develop new economic models, however, if we continue, as a profession, to support the status quo. I haven't decided yet if I'll read the new journal or write for it. Depends on the content, I suppose. I'm inclined to avoid it, however, and continue patronizing freely available OA journals instead.
Labels: metadata, open access
I'm glad that
Meredith, and others have brought up the issue of staying sane. Librarian life can be stressful. There are reasons I sought such a long break between gigs. I need to decompress. I let myself get overworked, discouraged, run-down in the past few months and I owe to myself to regain my health. My first month of vacation has been hectic and not quite the respite I need.
Part of regaining my health involves getting s**t done. There's a bathroom and bedroom to be painted and a home office needing a purge. I've been drinking the David Allen Kool-Aid so I'm going through all the steps of integrating those time management principles into my life. The idea is to get all those unfinished niggly things out of the way so I don't stress about them. I haven't gotten very far with the niggly things yet because I've been busy with gadgets, CC:DA work, culinary school, and ALA travel.
I lucked into a bit of money and was able to procure a MacBook and upgrade my cell phone to a Treo -- I'm hoping it will help me implement GTD. So far, it's just been yet-more-stuff-to-learn. The MacBook is (oestensibly) for my daughter. She has graciously allowed me to abscond with it for a couple of months. I've been geeking out installing a bunch of open source software and learning my way around Appledom. As a bonus, it's good for practicing the
techbootcamp unix cheat sheet commands (password=library). It is not a bonus for getting sh**t done. Evaluating and installing software takes time. Especially when your software installations fail because you didn't know to drag the installer to the application folder to do the installation. Nothing like beginning with a new operating system to humble one's-self. Add another new OS (Palm) to the mix and one really needs to get into the groove of
beginner's mind. My struggles to learn new tech have been ongoing since the day I left MFPOW.
My first few days off were spent with virtual colleagues writing
the preliminary report of the CC:DA Task Force on Internal and External Communication. I was hoping to write more about RDA ch.3. I haven't forgotten my promises to comment. I just have trouble finding dedicated time for writing coherently. See the part above re: stressing myself out. Task force deadlines have trumped analysis and opinion pieces for the blog. I can't even keep up with NGC4LIB. In the interest of staying sane, I hereby renege on my promises and add a very weak, "what they said!" to the others who have already very ably commented on Ch.3.
Speaking of getting sh**t done, CC:DA has the most work of any committee upon which I've ever served. I've been getting to know the history and purpose of CC:DA committee quite well during my participation on the TF. That is worth blogging about. I sense a bit of a disconnect between our purpose and our work. I've added it to my tickler file. I spent the better part of June madly reading all of the paperwork coming from the JSC and attending
culinary school in Ft. Bragg, CA. I've now got two courses towards my certification as a raw vegan chef under my belt.
I'm working on house and personal stuff so I can make space for all this writing I'm not doing. I have lots of opinions about the RDA scope document update and the various goings on. Right now, however, it's all I can do to keep up with the reading and committee voting stuff. ALA was its typical mad blur of meetings and travel. I'm happy to (finally) be back at home, even if only for a brief respite.
I'll be driving cross-continent and
cycling the Underground RR with the wife and kid for the last half of July/early August so posts are likely to be intermittent. There will be a lot of RDA stuff coming down the pipe for me to review for CC:DA, so that will form the bulk of my time online along with continuing to learn about my new Mac/Palm personal computing environment.
I really need to do better at doing nothing with my vacation time.
Labels: CC:DA, RDA
Diane Hillmann comments on a May 11 post to NGC4LIB by Karen Coyle. Karen says "The problem that we see today in the library world is that when there is a standard that is rising up to the point of being useful and usable by many in our community, it isn't clear where to take it so that it can move from being a neat hack to being a community standard," and suggests that ALA is the obvious body to promote library interests, at least in theory.
Diane asks "given this standards reality check from Karen, what are the implications for us?"
I say the implication for ALA is that the Divisions need to coordinate better on standards. They need to speed up the official channels of communication between committees. The extreme busyness of people contributes to the lack of standards work being done. Nobody wants more work. The other part of it is that we're not making effective use of social tools to do the business of the association. We create more work for ourselves by not using the time-saving new tools. The difficulty is that learning the tools takes time+effort=more work. There's no incentive to change.
That's starting to change (hooray for ALA communities,wikis, and blogs despite their growing pains! hooray for hiring Jenny Levine! ). But I still have trouble convincing people to use web-based conferencing a go. The reason it takes months for a committee to write a report is that, even with email, it takes time to send out a doc, get responses, compile responses, synthesize and summarize, check back in with committee members, then take necessary actions.
Come to think of it, a committee probably only recommends actions. Another problem is assigning responsibility for action and following through to make sure it's done!
As a task force chair, I'd much rather have one single real-time discussion with the task force members to gather all comments at once. It's faster. I'd like to spend less time volunteering please. Perhaps if we did better with the social tools, we'd do better with the standards work? ALA already has some channels in place for standards development. I give you the example of CC:DA.
I just submitted a preliminary report from the
CC:DA's Task Force on Internal and External communication. The TF reviewed
CC:DA's charge as well as
"Building international descriptive cataloging standards..." (the promotional "pamphlet" to explain to the masses just what-the-heck CC:DA does).
In the CC:DA charge section of the
"Building international.." document it says:
To develop official ALA positions on proposed international cataloging policies and standards pertaining to the committee’s area of responsibility and to advise the official ALA representative; or, if there is no official ALA representative, to act as the clearinghouse within ALA for review of these policies and standards and to serve as the formal liaison between ALA and the originating organizations.
Most of the committee scope described "Building international..." is related to the development of AACR and interactions with the JSC. Yet it also says CC:DA's role is to develop official ALA positions on cataloging and related standards. This bullet point quoted above indicates, to me anyway, that CC:DA should be taking a proactive role in standards discussions within ALA. It also means we need to pay attention to the first two words, "to develop." The FBI calls that a clue, son. To develop implies taking action. (smile). I think this action needs to be both internal to ALA and external to other standards bodies. CC:DA has sucked at taking the external-to-ALA actions.
Take a look at the CC:DA roster, for example. Most of the external liaison members are from library or librarian associations. There weren't any non-library bodies represented until Diane Hillmann (for DCMI) and Curtiss Priest (for IEEE) were added.
The "Building international ... standards" document also says that CC:DA welcomes suggestions
*In applying standards for bibliographic control to new and emerging technologies
*In employing automated solutions to the development of descriptive cataloging records.
Yes, CC:DA welcomes suggestions but has really only been taking them from librariankind.
If CC:DA is supposed to do standards work, why hasn't it? The snark in me wants to say that it's because the minutiae of dealing with AACR and MARC takes up all of CC:DA time and probably a forest's worth of paper. To be fair, there is the "pertaining to the committee’s area of responsibility" clause in the "Building international ... standards" document. AACR really is the bulk of CC:DA's area of responsibility as per the written charge. I can understand how we could collectively miss following through on a wee little suggestion to develop positions for ALA beyond AACR/RDA. I don't think it excuses the neglect, however. At CC:DA meetings we really don't much discuss standards beyond AACR/RDA (if we consider that a standard).
Betty Landesman, ALA's NISO rep, gives us a report each Midwinter and Annual, and she announces NISO proposals/votes on the CC:DA email list which gives committee members the opportunity to respond. I've tried to review those and give Betty feedback, but I just couldn't. My life is f.u.l.l. And I have no idea if other CC:DA members, voting or non, give Betty any feedback either. My sense is that nobody does, but you'd have to ask Betty.
I bear some of the blame for this lack of attention to the standards proposals as a voting member of CC:DA. Diane hit the nail on the head when she said the work of standards development doesn't happen, "mostly because we already have busy lives and sometimes our institutions don’t support such activity very well. " The RDA publication process has CC:DA members in a mire of reading/thinking/responding work. Not an excuse for not paying attention to standards. Especially when I hold the radical view that ALA should insist on decoupling RDA development from the Committee of Principles' publication schedule. I can't very well argue that radical stance unless CC:DA members are willing to be proactive in their involvement with the other related standards work.
I think it means that we need to add more people to CC:DA in order to spread the work load around a bit more. I also think it means that the CC:DA TF on Communication really needs to come up with concrete, do-able, alternatives to CC:DA's current methods of disseminating information.
Labels: CC:DA, metadata, RDA, standards
The version of ALA OITP's
Principles for Digital Content is available from the Digitization Policy Task Force's blog. It incorporates feedback from the comment period and will be going to Council for approval at Annual.
Labels: ALA OITP values
*with apologies to
Nicole Engard and her excellent blogWhen I try to teach myself something new, it generally takes (a) much longer than I anticipated and (b)I end up learning a few more things than I intended. My experience today was another example of this.
I'm tying up final threads at my soon-to-be FPOW. One of the tasks on my to-do list is to wipe the hard drive on my laptop. Cool! I thinks to myself. It's a perfect opportunity to make use of some of my newly acquired
techbootcamp knowledge.
When we started going through basic *nix, our fearless instructor told us about
Knoppix. For those of you not in the know, Knoppix is a Linux distribution which can run live from CD/DVD/thumb drive. Knoppix has a utility called "shred" which does a good job with wiping. I figured I'd go with Knoppix/shred over something like
dban because I could continue using the laptop after the wipe by running the OS from the DVD drive. So I moseyed on over to the Knoppix download page and started my grand adventure.
I learned a thing or two. First, I figured out that bit Torrent is a faster download than a regular old mirror site. Well no duh! says you. But, like Dean Hendrix discovered during his recent analysis of librarian use of peer-2-peer networks**, I've been like most other librarians -- hopelessly slow with the uptake.
Then I realized that I needed to perform check sum verification on the things I download, especially with something as powerful as an operating system. So I had to download MD5summer (.md5 being a check sum file type)and actually do a check sum comparison between my downloaded files and the originals. I've known about check sums in a theoretical sense for ages, due to my interest in digital preservation and authenticity. I'd just never actually used them.
Finally, I had to poke around in the BIOS to get my laptop to prioritize the DVD drive when looking for an operating system from which to boot.
When all was said and done (about 4 hours later), I had learned Bit Torrent, MD5summer, revisited a computer BIOS for the first time since the age of DOS, installed a plug in so Windows Explorer could burn a .iso file as an image, and burned a verified copy of Knoppix to DVD.
It.still.didn't.work.
I suspect I screwed up something with the check sum verification and that my DVD is corrupted. Whatever. I've spent too much time on this little adventure. I'll use something simpler and just hand over the laptop after the wipe. It's not worth the effort to get a Knoppix DVD when I'm only using the equipment for another day.
It is frustrating to spend time doing something which should be incredibly simple if you know what the heck you're doing (download OS to DVD, boot laptop from DVD, wipe). I do consider learning the additional programs and reviewing BIOS to be time well spent. The thing is to actually remember that learning can be a slow-going and frustrating process without a guaranteed result. This is a good lesson to remember as we move forward with techbootcamp.
When the more technically inclined teach us newbies, they can toss off statements like, "oh, just download Knoppix and boot your laptop from there" without thinking about the prior knowledge required or the necessary computing environment. I'm reminded to consider these factors when I do any type of training. I'll also remember to schedule more time for these types of things.
**see Hendrix, Dean. "Peer-to-Peer (P2P) Knowledge, Use, and Attitudes of Academic Librarians." portal: Libraries and the Academy, 7(2) April 2007.
Muse subscribers can view the article here:
http://muse.jhu.edu/journals/portal_libraries_and_the_academy/v007/7.2hendrix.html Labels: knoppix, techbootcamp, what I learned today
Today was the third installment of
techbootcamp. We've now configured server hardware and gone through basic *nix commands. We can grep with the best of them now, but I still get a bit confused when piping commands. More practice will help.
Next steps? Take on a few projects and use them to train ourselves on particular open source applications. We have a few ideas in mind. My lib school student friends intend to digitize some out-of-copyright knitting patterns, get'em described, and make them available and searchable on the interbunny. They're considering using Greenstone.
For my part, I'm going to do something with ePrints. Not quite sure just what that will be yet, but I suspect it will involve images, batch importing, and metadata crosswalking. I'm totally open to suggestions if anybody has any.
Posts will be sporadic between now and 8/20. I'm leaving my current place of work on 5/30. I'm off on a few weeks vacation (
culinary school!) then ALA, then more vacation, before starting the new job in August. I will not have a laptop. I've avoided purchasing one because I've been using my soon-to-be-former-place-of-work's equipment. That, obviously, can't continue after I quit. I do intend to purchase my own equipment but I'm not sure yet when that will happen. Depends on how much I spend on vacation.
Labels: oss4lib, projects, techbootcamp
via
Digital Koans: The Digital Library Federation and OCLC have released their
Registry of Digital Masters Working Group’s Registry of Digital Masters Record Creation Guidelines.
This is cool. The registry means that (potentially) libraries and cultural heritage institutions can save time/effort/money in terms of managing digital objects for long term preservation. One thing everybody at DigCCurr agreed on: we need to figure out the economic models for sustaining digital preservation projects. Creating tools to avoid duplicating effort is a step in the right direction. No comments from the peanut gallery about OCLC and monopolies or the use of MARC format please.
Labels: digital preservation, economic models, registries
I'm a few days late reporting on
this opinion piece in the Washington Post by Fran Berman and Jim Barksdale (I was @ IUG all week, all apologies) .
Barksdale, as you may recall, used to be CEO of Netscape. Berman is the director of the San Diego Supercomputer Center (full disclosure: one of my FPOW). Both are heavily involved in NDIIPP. Barksdale is on the advisory board. SDSC is a major player in digital preservation.
They write eloquently about the need for funding digital preservation projects. They use the familiar stories about heroic recovery of Census and NASA data. The article is notable because it's being published in the regular news. I say bravo. In the long term, increasing awareness of the problem will assist in generating necessary funds.
I don't harbor any illusions that Congress is going to return the millions they recinded from the NDIIPP. I do think that having the issues in the general press will help us in the academic repository realm in our communicating with research faculty. In my experience, many faculty members don't think much about the long term preservation of their data and scholarship. The Washington Post is another avenue for faculty to get the message.
In marketing they say that a person needs to see a message seven or more times in several different ways before it sinks into consciousness. So, yay. It's great that the issue made it into mainstream media in
the United States. Britain has already had some success in that area.
See also:
http://observer.guardian.co.uk/uk_news/story/0,6903,661093,00.htmlhttp://education.guardian.co.uk/elearning/story/0,,916073,00.htmlLabels: funding, media, NDIIPP
Friends of mine run the annual
"Grilled Cheese Invitational". I didn't get a chance to compete last year because I went to Canada so I could get legally married. It seems like its getting bigger and more notorious as the years go by. I think there have been five or six so far. I can't recommend it highly enough. These folks take their grilled "sammiches" incredibly seriously. We're talking presentations that include costumes and cheerleaders. It is a heck of a lot of fun.
Grilled Cheese Invitational from
Hot Knivez on
VimeoLabels: life1.0
The
techbootcamp crew got together again yesterday. It's been difficult coordinating schedules but we finally managed. I think it should be a bit easier from now on, since the library school students are finishing up their quarter.
My camera is acting a bit funny, so I don't have pictures this time. This week we discussed the basics of Linux -- files, directories, command structure (command+arguments+flags), common commands for moving files around, shell navigation tips. T. brought her chihuahua puppy Oberon which lightened the mood. T & I both got accounts so they can practice logging into a shell and playing around with the commands.
Slowly, but surely, we're working towards the day when techbootcamp can become a working lab where we can get hands-on experience with the major Open Source applications in digital archiving. Yay.
Next meeting will be 5/27 3pm at the
Boneless Ranch. I should have my camera fixed by then.
Labels: techbootcamp
Karen Coyle interviews Diane Hillman about the outcomes of a recent meeting between the editor of RDA, some members of the Joint Steering Committee for the Revision of RDA,* and other stakeholders.
Diane Hillman has been a tireless champion encouraging the JSC to work with other metadata communities to develop RDA.
The RDA/DCMI collaboration will include an RDA application profile for DC and a formal element vocabulary. An controlled yet extensible element vocabulary is necessary for describing carriers as per the revised chapter Ch.3 of RDA. No, I haven't forgotten to write up my notes on that, btw. With any luck, I should get to that today!
*note the new name! for those of you not in the know, the JSC used to be for the revision of AACR
Labels: DCMI, interoperability, metadata, RDA, standards.
Administrivia first - It seems as if blogger ate my comments settings so I didn't receive the auto-notify that I had comments awaiting moderation. Many apologies to those of you who commented. I've got everything reset now.
So. I've discovered the downside of migrating. It's reading the old documents you wrote and cringing in embarrassment. Way back in the day I used to write a column for San Diego Sidewalk. Remember the Sidewalk sites? They were Microsoft's answer to Yahoo local. They
ended up getting bought by CitySearch. The column was called "Styletramps" and it appeared in the GLBT section of the site. I was the queer Vanna White of online shopping. The idea was to find fun, interesting things (fashion, music, whatever), and write about them. It was a fun gig -- they paid me $100 per column and the columns were small at 250 words.
I just pulled out the disk which had my columns and writing. Ach. The horror. The horror. Here's the bio I wrote for them:
I'm Laura Smart. Yes, that is my real name. No relation to Maxwell, but I was jealous when he married Agent 99. Barbara Feldman was not only a babe but a snappy dresser to boot! Like almost everyone else in this state I came from somewhere else -- London, Ontario to be precise. When I'm not style-tramping I'm a librarian (a.k.a. "Information Diva"). My multifarious dabbling includes playing keyboards, writing bad poetry, drawing, constantly redecorating my various spaces, matchmaking (always a bridesmaid, sigh!), and fashion on a shoe-string. Susie Bright is my hero. I'm dying to become Slater-Kinney's lead groupie and the first commercially successful female drag-queen. Maybe it's really a secret yearning to become Pamela DesBarres? While I convince Malcolm MacLaren to manage me, I'll continue my quest for San Diego's kookiest accoutrements.
I guess there really is a big difference between age 25 and 35. I am sooooooooo not that person anymore. Now days I spend my time training for long-range cycling trips with my wife and daughter, renovating my house, meditating at the Zen Center of Los Angeles, and making luscious raw vegan cuisine (fyi, i eat 80% raw ... but no, I'm not a vegetarian. I just play one on t.v.)
Labels: digital preservation, electronic files, life1.0, migration, refreshing, styletramp
I'm in the process of doing spring cleaning in my office(s). I pulled out a bunch of 3.5" floppies and zip disks. A big bunch. I'm cursing myself for ignoring the situation. I should have reviewed these files and migrated them years ago. Fortunately I still have access to both types of disk drive.
I found papers dating back to my undergraduate years -- that's 1990-94. Most of my electronic files have updated easily to current formats. The older ones were a bit trickier. They were in WordPerfect 5.1 for DOS. They ended up having some garbage in the text when I converted, which was a pain. Now I'm just enduring the tedium of viewing the contents of the disks, deciding what to keep, then wiping the disks so somebody else can use them (anybody want a pile of free disks?).
I'm going to move my files to dedicated server space rather than keeping them on fixed media. I think this will help with future refreshing and/or migrations. I definitely will visit my files a bit more frequently than every 10 years. I suppose I should do it each year when the time changes from standard to daylight savings and vice versa. Just like changing batteries in your smoke detectors.
I'm lucky. I didn't lose anything I value -- except perhaps my time. This is probably better done more frequently with fewer files.
Labels: digital preservation, electronic files, migration, refreshing
I've fixed the spelling and rendering issues in those blog entries which were raw dumps of the notes I took during the DigCCurr sessions. Apologies for their initial roughness -- when I said they were raw, I meant it. It didn't help that I was using Appleworks to draft the notes. Blogger did NOT like the text encoding Appleworks used and therefore rendered all diacritics in a very messy fashion.
I don't think I'll cover any more conferences in this manner. I don't think its useful for those who aren't there. The conference presentations will be put on the web at the DigCCurr web site at some point if one is interested in seeing the full content/context. I think it's far more useful to get some summary and analysis of the proceedings.
All in all I thought it was a fabulous conference.
The good?
*Reconnecting with colleagues, especially my former coworkers from SDSC and the UC. *Getting up-to-date with the European digital curation projects
*Hearing Liz Bishoff speak -- she is incredibly dynamic.
*Realizing that everybody is struggling to define the problem space of digital curation
The bad?
*The food at the Friday Center was both stale and in some cases rotten (wilted and rotting salad on both days). As a vegetarian the entree choices were limited to starch starch and more starch.
*Location - U.N.C. is gorgeous, but I mentioned feeling isolated in suburbia. Don't try to be a pedestrian near the Friday center unless you have a fair amount of courage.
My take-aways?
Nobody has a good definition of digital curation or digital preservation. Partnerships and people management are key to the success of any digital curation project. Librarians and archivists have to realize that we are not the only profession which has been radically transformed in the past 10 years. No single institution or profession can provide all of the necessary skills. In addition, no single institution can provide the economic sustainability needed to ensure longevity of digital objects. Digital curators need to pay more attention to developing business cases for archival repositories. Most projects are currently funded as projects and we need to move from soft-funding to production level services. In order to make the business case we should pay better attention to the demand side of the equation. What do the content providers need? For what community is the content being archived? Focusing on fulfilling those needs can assist in figuring out (a) what to do and (b) how much it's realistically going to cost.
I'm looking forward to reading the published papers from the conference. All in all, I'm glad I went.
Labels: DigCCurr 2007
Cal Lee and Cliff Lynch shared the results of the survey and lessons from individual feedback.
Cal Lee went first. He gave a big caveat that the data is as close to research as "The Situation Room" is to journalism.
Survey Questions
What do you see as the biggest digital curation challenges in your institution?
*A few high level categories: need to change/influence beliefs/perceptions of people outside; skilled IT staff that understand the issues; expectation management (don't promise the impossible without expanded staff and/or time line); organizational commitment; insufficient buy-in from the top;
*lack of IT support either internal or outsourced
*essential technological components are lacking
*money or funding
*ownership of the problem-space (other disciplines besides LIS/AS
*how to identify roles and responsibilities
*planning, quality control
*skills of digital curators-- lack of wide spread competencies
*volume of data, long term preservation and access, metadata
*define what digital curation is, what it takes to do it
Discussion: very difficult to disambiguate the challenges listed above since they all touch upon each other.
2. What are the most important topics to cover in digital curation education?
*High level conceptual orientation, being aware of information/archival theory, OAIS model, risk management
*Functions and tasks like cost modeling, systems analysis, cataloging, web design
*Artifacts
*Standards
*Current landscape: economic models, major players, trends, users and services
3. What do you look for if you were hiring a digital curator for your institution
*Communication skills were mentioned far more frequently than any other skill
*Programming
*Leadership
*Project management
*Metadata
*Service orientation, history of profession
4. What are the skills required that are now currently lacking?
*technology and IT
*programming
*server administration
*knowing alternative technologies (not just pick one but how to evaluate between options
* a good BS radar
One comment in the survey about distributing these skills over many people so one resignation doesn't hold up production
*people with fundamental respect for research process
5. Other comments?
*Allow for retooling for professional development for current LIS/Archivists but also for CS and systems people to remediate in the LIS stuff
*Metadata
*comment on being comfortable with digital curation but less so with digital curators
*don't underestimate the importance of management skills
*strengthen still useful skills from LIS/AS
*think about who you want to recruit -- do they need the discipline degree?
*eventually this will be "the" curriculum, we should teach it within the context of everything else
*seems to be more suited to a specialization on top of other degrees
Cal comment on "vacancy in the professions" this is distributed among a bunch of different professionals. Who will be responsible?
Cliff Lynch - giving a mix of his opinion, reaction, based on many conversations with people here and in the past year. Some study of what was and wasn't covered in the program at this symposium.
Suggest that language does matter. We moved from the phrase "data curation" which comes out of developments in the sciences such as Chris Greer's long life data report to the "truly frightening" term digital curation--which we may want to consider getting rid of. We start from the perspective of archives and records management which are traditionally marginalized although recognized as important.
Better to recognize that we're not the only ones being impacted by large scale computing and networking. Everything has changed radically in the last couple of years. CL just came from NSF/JISC workshop -- it started about digital repositories, then morphed into talking about "data driven science" and then finally became about the entirely new ways of doing science/research. The new types of roles emerging.
"Research facilitators" start looking around S&E facilities of large grant receiving institutions -- they are finding ways to squirrel these people into their organizations (i.e. "staff technologist")
Humanities "critical editions" used to be important -- are they still?
There is a set of activities around data curation that we need to define, specifically regarding management and preservation of scholarly data. One set of activities for long term memory organizations. One set of activities for the creation of the data. Just how much specific scholarly expertise do you need? Critical question.
Difference in use of word "curatorial" in bio-informatics/biology data sets. It's more of a critical editorial role.
Digital curation vs. data curation. Curate. Recognize that once upon a time that libraries had curators who built and managed collections then this role got sliced and diced into all the various types of librarians you've got now (bibliographers, catalogers, etc.). The return to curation as term reflects the changes that need to happen in the way we think about acquisitions in libraries.
Other comments: stunning how different the participants view curation. we all have different opinions of what skills somebody with a certificate in digital curation would have.
We need people who can sort through social, organizational, economic issues around sharing, destruction, anonymous-ization of information resources across time. Not just a narrow view of records management. Broader view needed of social policy and impact in order to make the necessary case for stewardship.
We have a lot of case law that gives primacy to individual ownership rather than social commons.
Economics - besides business and cost models for individual organizations, it is a social good and needs that type of funding rather than a bunch of organizations recharging each other in a circle.
Risk management - very important but difficult to do as it's hard to quantify value of irreplaceable objects. CL says we haven't yet discussed an acceptable loss rate. Keeping all bits for perpetuity is not doable from engineering standpoint. Must get explicit about that.
If your bits don't make it to next week all discussion is mute. We're in an environment where our information is exposed to all kinds of complex threats all the time.
How do you teach the next generation how to do something we don't know what to do? Go back to original principles underlying sound curation as place to start. Foundational principles are a good place to frame analysis.
Labels: DigCCurr 2007
DigCCurr 2007: Concurrent session: Digital Curation in Practice
I attended "Science and Biomedical Data"
Speakers: Milton Corn, Don Sawyer, Tyler Waters
Milton Corn "Archiving the Phenome"
Phenome - total mass of physical and mental facts known about you. It's coordinating genome info with patient info (date of birth, hair color, cholesterol , etc.). (.g. OSHA), State laws very, Need to maintain a paper record for preservation purposes under debate. Text can substitute for non-textual information (e.g. x-ray report)
2. Well-being of the patient. Diagnosis and prescription of new illness can be influenced by past history. Implies records needed for life-time of patient. NOT a legal requirement. Hard to assemble from distributed sources, argument for personal health record or "super" repositories.
3. Well-being of family/nation. Patient's health record in genomic era of value to family, and to the entire population. Secondary use of health records of value to health services research, public health. Implies preservation "forever."
How to archive?
Same problems as for all digital archives plus:
*multiple content owners per patient
*variation in software, hardware, data formats, ontologies etc.
*privacy issues -- HUGE issues
*ownership of data not always clear
*multiple media, text, graphics, images all included
*Not seen as a problem in the U.S. by AMIA, AMA, NARA, MLA, AHIMA, DHHS, AHA but modest discussion in U.K., Belgium, India, Australia)
Corn surveying current practice, results so far:
*DHHS: no response
*Large HMO: no response
*Hospitals and offices -- no archiving policy "we plan to keep forever", privacy safeguards for daily use, definitive record is mix of paper and electronic and may not include images or graphics, how to manage old date when EHR system is changed remains a problem . N.B. one practitioner said he erases colonoscopy videos after reading to prevent second guessing later by lawyers.
Summary: curation of clinical data
*not a problem now, at least it's not recognized yet
*will become a problem as soon as size, migration costs escalate esp. with imaging
*preservation by CIO may, in fact, work for solvent enterprises (hospitals, pharmacies, etc.) i.e. the public pays
*situation for office practices uncertain
*Can health care system conglomerate all health data for an individual? Unlikely unless patient is the custodian.
Don Sawyer "Digital Curation at the National Space Science Data Center"
Overview: NSSDC requirements and digital curation, NSSDC holdings and archival services,
NSSDC requirements:
*functions as the space science permanent data/metadata repository
*provides the space science community with data stewardship guidance and supported. Data made available to the research community by various repositories should be well documented in order to support independent usability via, for example, virtual observatory access
*NSSDC as a repository making unique data/metadata available must participate in Virtual Observatory development efforts to assist in the practical evolution of these concepts
NSSDC uses OAIS concepts
Data providers:
*NASA's Space Science Active Archives typically under written agreements (MOUs)
*Space Science Space Flight Projects
Users:
NASA Space Science Archives
Space Science Projects
Individual researchers
General public
NASA headquarters
Digital holdings: acquiring data for 40+ years, currently 47 TB, reaching 270TB by 2010, 1300+ experiment from 375 US and international spacecraft, over 4400 data collections (typically each with a large numbers of files)
NSSDC Archival information services
*permanent archive: long-term curation, uses AIP implementation, data may be repackaged and/or transformed to maintain accessibility and usability
*Second archive: data also held in another archive, NSSDC holdings may be AIP form, data may be repackaged and/or reversibly transformed
*Third archive...
Administration activities
External: MOUs with various active archives, respond to NASA HQ requests, monitor progress of SAMPEX resident archive (home after project ends)
Internal: Oversee maintenance and modernization of infrastructure including systems administration (e.g. low cost Linux), manage personnel and physical space, oversee refreshing of tapes in archive every 6 yrs or less, oversee migration of legacy data from 9trk/3480 tape archive into current media
Ingest activities
*Development: develop, maintain and enhance new AIP ingest software, enhance remote submission information package and AIP creation sofware (MPGA) to support non-linux platforms, large SIPs and reliable electronic delivery of SIPS
*Operations: identify current/expected missions, collections, research and organize information, populate data management database
Archival storage
*development: develop upgrades to AIP storage manager, develop provenance management system, develop integrated document management preservation system
*operations: manage media and AIPs for 3 service levels
Data management
*maintain descriptive information database to include photo searching & support automated ingest, revise database to normalize and streamline infrastructure, design and implement XML mark-up of metadata producing systems to enhance finding aids
*participate in appropriate registries in Space sciences (e.g. heliophysics virtual observatories)
*provide general request and access support
Preservation Planning Activities
*External: continue participation/leadership in standards activities, monitor technology trends, sponsor NASA-wide workshop on archiving and metadata standards, provide curation guidance regarding documentation, database reports etc.
Key staff roles and skills
*Curation scientists: PhD in space science discipline, extensive handling and analysis experience
*Information architect
*Systems engineers
*Database administrator
*Operations manager
*Archive Head: PhD in space science discipline
Conclusions:
*Need science discipline experts with curation training (curation scientists) for interacting with data providers, data users
*Need computer professionals with curation training, working with curation scientists, for development and operation of internal systems and to interact with similar personnel at data provider sites
*Desire data providers with 'preservation understanding' to assist with ingest.
Tyler Waters " To Stand the test of time" Report on workshop of the same name
(**ed. note presenter went incredibly fast and it was quite difficult to keep up, pardon the brevity of these raw notes in advance)
Workshop findings
*The ecology of digital data reflects a distributed array of stakeholders, institutional arrangements, and repositories with a variety of policies and practices
*The scale of the challenge regarding the stewardship of digital data requires that responsibilities be distributed across multiple entities and partnerships that engage institutions, disciplines and interdisciplinary domains
*Historically universities have played a leadership role in advancement of knowledge and shouldered substantial responsibility for the long term preservation of knowledge ... an expanded role for some research and academic libraries and universities along with other partners, in digital data stewardship
*data is distributed, heterogeneous
*stewardship involves both preservation and curation and should be throughout the research life cycle.
Workshop recommendations
*NSF should facilitate the establishment of a sustainable institutional framework for long-term stewardship of data. This framework should involve multiple stakeholders by:
*supporting the research and development required to understand, model,
*supporting training and educational programs to develop a new workforce in data science both within NSF and in cooperation with other agencies, and...
*developing, supporting, and promoting education efforts to effect ...??
Also
1. Fund projects that address issues concerning ingest, archiving, and reuse of data by multiple communities
2. Foster the training and development of a new workforce in data science
3. Support the develop of usable and useful tools
4. ??
5. include data management plans in the proposal submission process
6. NSF should encourage the development of data sharing policies for programs involving community data
URL for full report "To Stand the Test of Time - Long-term stewardship of digital data sets in science and engineering"
http://www.arl.org/bm~doc/digdatarpt.pdf Question re: NSF funding models for data curation centers
*want proposals in domain science areas, usually funding for 5 years and can be renewed for another 5 years
Labels: DigCCurr 2007
DigCCurr 2007: Concurrent session: Building Capabilities for Digital Curation
I attended "Defining Capabilities" Speakers: Liz Bishoff, Nancy McGovern, Oya Rieger
Liz Bishoff: Digital Preservation Assessment: Readying Cultural Heritage Institutions for Digital Preservation
Benchmark paper 1996 "Preserving Digital Information" by ARL defined the issues and possible solutions to digital preservation. Cultural heritage includes scientific as well as arts.
The number of projects doing digital preservation is minuscule compared to the number of cultural heritage institutions that are digitizing, or having born-digital items. The publishing of the ARL white paper indicated that digitization can be preservation. More funding agencies have preservation solutions in their requirements. There are emerging state, national, and international initiatives for digital preservation. Are all these projects ready for managed digital preservation?
Bishoff surveyed institutions on their readiness. Also Anne Kenney at Cornell studied institutions participating in NEH funding 2003-5. Cornell study results avail RLG diginews. Cornell found that 90% still using CD/DVD for digital storage. That number is now about 70%. Only 50% had policies but only 30% have implemented those policies.
2005 NEDCC Survey - 66% of institutions had no one responsible for digital preservation, paralleled Kenney's findings. Also a fair number of institutions indicated that they had backed up only once (or not at all!!!)
Findings from Bishoff's survey: Issue of digital preservation is just now coming to the forefront of discussion and action. Many institutions are still at the the project stage and have not yet gotten to the ongoing program stage. Written policies and documented digital preservations practices are lacking. Preservation/conservation staff are generally NOT directly involved in many of the digital initiatives.
They also found: Few have coordinated institutional approach to their digital initiative especially in the areas of standards (imaging, metadata), quality control, access, promotion, digital preservation. A big lack of understanding of when institution has a "born digital" material. CD/DVD is the major storage media but moving to networked servers. Refreshing data on CD/DVD with lengthy periods between refreshing. Quality control of master images is inconsistent at best. Education is important before doing a digital preservation project. Ability to advocate for digital preservation is lacking at many institutions. Funding is primarily through local funds and grants.
Areas of policy which support digital content: mission and goals, collection development, emergency preparedness, exhibitions, preservation, strategic planning, public services, rights and licensing.
If the institution was out sourcing, do they follow the elements of a trusted digital repository (TRACK?). Financial viability of the company you choose is very important.
Types of recommendations: improved documentation (continuity planning, work flow processes, etc.), review digital preservation activities including refreshing schedules, quality control, etc., review system back-up procedures and implement off site storage.
So what does it mean? focus of long term preservation has been on the technology and standards, certification, etc. to build the infrastructure. To make it reality we now need to
*expand advocates for long term preservation
*expand the knowledge base of practitioners
*move from digital project to digital program
*integrate preservation into all aspects of digital life-cycle
*develop best practices
*make policy examples available
Education needs to be moved to the state and regional level. Also we need both professional AND continuing education. That needs to focus on technology and standards, policies and tactical strategies (development and implementation), work flow and documentation, business planning and all that it involves such as market research, financial analysis and planning.
Conclusion: progress is being made, need to increase awareness of importance, most institutions which are doing digitization, however, are not doing the the basic preservation activities.
Question - what is the definition of digital preservation? Response from ALCTS Preservation co-chair -- they are sending out a definition to various email lists in the next couple of weeks.
Nancy McGovern: Canary in a Coal Mine: A digital preservation response to technological change
How to deal with open-ended change? We've lacked specificity and scope about how to respond to that change. How to we detect things that might have adverse implication for digital preservation. How do you go about doing the assessment?
Outline: technology response requirements, common response, scope of interest, priorities for digital preservation, timing response to technology.
Technology response: the call for responding came in the 1996 seminal paper. The specification is most explicit in OAIS.
OAIS monitor technology: objective: track emerging technologies, information standards, computing, platforms. purpose: avoid obsolescence.
Examples of technology watch: DPC, DCC, DigiCult, LITA, PRONOM
Characteristics: range in services provided by technology watch services reflects absence of definition. Providers select topics not community, lack access to accumulated data, defined levels of service is rare (detailed synopsis? headline?).
Community formalization: Digital preservation for museums CHIN 2004 service requirements, LIFE project UCL/BL, 2006. Strategic priorities of SAA 2006-7 calls for leadership and training on how to respond to technological change.
Scope of interest: macro taxonomy
Object - file formates, media metadata
Collection - relationships, metadata
Repository - software, tools, modules
Platform - protocols, security, software, hardware
Scope: micro taxonomy
35 technology types enable OAIS
Examples: communication (ability to convey message), logs, policy enforcement
Priorities for digital preservation: Contact, interaction, exploitation, risk management, automation.
Contact: requires direct contact with digital content
Interaction: must respond to, not just be made aware of, changes in digital content
Exploitation: potential to contribute to digital preservation strategies by exploiting opportunities
Risk management: participates in the avoidance of risks to integrity, longevity, or authenticity
Automation: potential to perform more effectively
Timing response is important
Identify potential new technology, monitor, assess, respond, act to avoid obsolescence of existing technologies.
Technology responsiveness - Community objectives:
*accumulate current and historical information
*develop competencies and tools
*incorporate community developments
*build a network of contributors and users
*ensure sustainability
Question: is there a need to have some kind of peer review process/consensus building on deciding whether a technology matters or not or how to go about implementing collaborative technology watch?
Answer: Assessment is key. Organizations should be able to pick the right size thing for them which fits their requirements. People must be able to picture themselves in the results.
Digital preservation is research and development although we do it in a production environment we need to keep questioning and monitoring and assessing. William Gibson "The future is already here, it's just not evenly distributed yet."
Oya Reiger: Select for Success: Key principles in assessing Repository Models
Within a life-cycle framework, digital curation involves a series of technical, intellectual, and managerial activities in support of stewardship for digitized or born-digital information assets.
What is a repository system? A system to capture, store, index, manage, preserve, and deliver digital objects.
Factors in choosing a repository model
*development characteristics
*financial sustainability
*digital library infrastructure
*interoperability and support for standards
*institutional policies and practices
*support of archival business requirements
*content type characteristics
*preservation functionality
*usability (staff and end-user)
*search, browse, access features
See Art Libraries Society of North America "Digital Image Database Standards Checklist"
RLG/OCLC " Trustworthy repository certification"
Key principles in selecting a repository
1. Identify key stakeholders (users, programmers, subject experts, etc.). It builds awareness and trust, gathers feedback, build trust, get support, expand resources, understand risks
2. Conduct needs assessment to characterize your environment. Include documents (document type, condition, metadata attributes, selection criteria, usage restrictions, relation to other collections), users, and resources (available staff, money)
3. Explore resource requirements. Institutional repositories SPEC KIT- start up range from $8,00-$1,800,00 (mean=$182,550) and an average ongoing operating cost of $113, 500 . There are many hidden costs. No common metrics yet to determine what information points to include
4. Understanding the existing and evolving human landscape. Work culture and practices, relevant social groups, interpretive flexibility, appropriation (how technology fits into the workplace and how it supports your culture).
Quoted Judson King's report from Institution for Studies in Higher Education, 2006 "Scholarly communication: academic values and sustainable models" -- "Approaches that try to move faculty and their deeply embedded value systems directly toward new forms of archival systems are destined to fail"
Conclusions:
*flexible and scalable repositories - Choudary and Martino 2005 "At Johns Hopkins, we are promoting the idea that applications should access repositories through an abstract, repository agnostic layer, rather than through custom application to repository integrations" see Cornell, as pioneers they ended up with many different repository systems (Greenstone,
*web services/service oriented architecture models - ex. file format migration, file obsolescence service, social tagging, citation analysis, text annotation, plagiarism detection (added to arxive recently)
*repurposing - ex. Cornell making their digitized books available via Amazon print-on-demand
*new information chain see Van de Soemple DLIB article on how the information chain is expanding
Labels: DigCCurr 2007
DigCCurr 2007: Migraine kills the plenary
I missed this morning's plenary because I awoke with a throbbing right eye which quickly evolved into a killer headache. More sleep and a few ibuprofen have lessened the impact but I'm still not quite up to a long day of sessions. I shall persevere and keep my fingers crossed that I feel better after food and caffeine.
The organizers of the conference are surveying attendees on the types of skills they look for when hiring digital curators. It made me think about the myriad areas in which a repository manager must be conversant. First and foremost, I think repository rats, need people skills. So often our problems are not technical but political. Developing teams, collaborating across multiple institutions, convincing contributors to contribute to collections, and raising money all require schmoozing.
Second, I think repository rats need an appreciation of archival theory. I didn't glean anything about archives/archiving from my MLIS. I ended up going to UCLA for a CAS to get that specialization (*ed. note - I'm one credit shy of completing that certificate. I won't speak of why I quit UCLA, but if you truly want to know, I'll tell you in-person). While working on the CAS, I found the learning I did about authenticity and evidence in digital record keeping to be incredibly useful for explaining to people why archival control is necessary for some types of repositories and/or digital collections. Third, comes all of the technical skills -- which are quite numerous. Running servers and databases, metadata creation and interoperability, creating and testing websites only skim the surface. Finally, comes financial and business acumen. Most repository projects have been financed with project funding. As repositories evolve from projects to production this type of funding is not sustainable. Just ask the folks at NDIIPP what happened when Congress changed. An understanding of the business case for your repository is crucial not only for pitching funding agencies but also for evaluating the success of your repository. The business case provides the measures of assessment.
I'm off in search of a decent coffee. I'm staying at the Marriott near the Friday Conference Center. It feels like I'm in the middle of nowhere. You can't even cross the street due to a lack of sidewalks and pedestrian signals. Nothing is really visible except a golf course and a housing development. Unfortunately, hotel room coffee is a bit weak for my espresso habit. There is a business park across the street and I've heard rumors of a bakery. Bakery = potential latte. Wish me and my pounding head luck.
Labels: DigCCurr 2007
***ed. note -- it's 4ish in the afternoon, my brain is dead tired, and the speakers in this session are incredibly quiet and mumble-y. these notes may be more raw than the others. ***
I attended "Designing & Implementing Repositories Across Institutional Boundaries"
Speakers: Mike Smorul, Bill Underwood, Richard Marciano
PAWN project
Michael Smorul
http://umiacs.umd.edu/research/adapt or Google ADAPT UMIACS
Problems facing ingestion
*reliable data transfer
*each producer/archive interaction is unique
*how the archive deals with each collection is unique as well
Distributed ingestion with PAWN
*multiple producing sites with different requirements
*separation of administrative responsibility
Components - showed network architecture diagram
Package work flow overview
1. create producer-archive agreement
2. client package template
3. create package based on template
4. once approved, packages can be archived
5. rejected packages can be held until rectified or deleted for resubmission
Custom roles
*actions in PAWN can be grouped together to create roles (modify items in a package, create users, etc.)
*default roles
**producer
**records manager
**archive manager
**global administrator
PAWN utilizes SRB from SDSC
Case study 15,000 CD-ROMs of LANDSAT data
Case study from SLAC @ Stanford, created specialized roles (records creator, records liason officer, records manager)
William Underwood. PERPOS (Presidential Electronic Records Pilot System)
*initial objective, R&D project, develop tools to support archivists in gaining intellectual and physical control of PC records from the administration of George H.W. Bush
*contents of 500+ hard drives
*included operating system and software applications as well as user-created files
*DOS and Windows 3.1
PERPOS
*developed a prototype system to support accession, arrangement, preservation, review and description of e-record series
*evolutionary prototyping
*system has been pilot tested by archivists at the Bush Presidential Library
*several record series have been systematically processed
*FOIA processing currently being Pilot tested
***found viruses in legacy data *** important to use virus checkers
Summary of research results and benefits
*supports both systematic and FOIA processing of presidential e-records
*provides an environment for experimental application of advanced information technologies to archival process
*document type identifier speeds up processing
*automatic description of items, file units, and record services enables earlier intellectual control of e-records.
*prototype access restriction checker
*knowledge acquisition reduces work required to apply access restriction checker to records of subsequent administrations
Richard Marciano, SDSC/UCSD
The perspectives of digital curators on building distributed repositories
Collaboration between digital curators and IT folks looking at how to make cost effective distributed repositories.
PAT = persistent archives testbed
2 yr NHPRC project, extended for 1 year
Project summary:
*participants were digital curators from libraries, archives,, historical socieities, scientific data environments, museums and IT researchers and staff
*main goal: design a distributed repository for electronic records management, demonstrate the management of various types of records with a common software infrastructure
*approach: each site choose an archival collection, set up access control and update permissions for their preservation environment independently of the other participants.
Presentation goals:
*comment: David Giaretta says "no repository is an island" ... PAT fits the archipelago model
*examine: lessons learned and skills needed by digital curators to automate archival functions (appraisal, accessioning, arrangement, description, preservation, and access of records), benefits achieved by using common infrastructure
PAT Community Grid
Local storage resources
||||
SDSC Archive
||||
MCAT Metadata catalog (Oracle), Shared preservation environment, Storage resource broker (SRB)
Unique contributions of digital curators to the infrastructure:
*Windows based SRB clients/servers
*Development of a Perl for Windows client library
*Bulk operations were developed, tested, and refined (registration, accessioning, metadata extraction from the records, metadata loading, validation of data movement into/out of the system/within the system)
*End-to-end work flows were developed (accessioning, replication)
*SRB bugs revealed: better reliability
*MCAT ported to mySQL (Oracle, DB2, Sybase, Informix)
*Development of a wiki for documentation
*Registration of filenames with unusual characters discovered and fixed
*Suggestions on ways to simplify governance issues tied to particular types of data management:
**need to express such policies as rules to be applied to the data management system.
**development of the next generation of data grid technology: iRODS (integrated Rule-Oriented Data System)
**Each preservation process is expressed as a set of micro-services (operations that can be performed using a remote storage system)
What Digital Curators Liked
*leverage common software and hardware
*use commodity storage hardware
*lower the cost of participation
*reduce the level of expertise required at each site
*focus on management of the archival collections and outsource the details of the archival repository
*automate the manipulation of collections to minimize the level of effort
Conclusions
*PAT suggests that sustainability is probably beyond the capability of most archival repositories (costs of tracking new types of technology, expertise to manage, costs of storage systems and databases)
*outsourcing of the management of records is feasible through use of data grid technology
*preservation environments can be assembled by creating regional community archival partnerships with university data centers (yes, there are still many political barriers)
*independence can be maintained
*service agreements for storage and preservation of archival e-records are needed
Labels: DigCCurr 2007
DigCCurr 2007: Afternoon plenary: What is digital curation?
Speakers: Peter Bunemen from U.K. Digital Curation Center & William Lefurgy from NDIIPP.
Peter Buneman: Databases & Digital Curation
Databases in science and scholarship:
*Nearly all branches of science depend on database technology for storage and retrieval of data.
*This has changed the scientific method (Mike Lesk)
A curated database is:
*A reference work
*Value lies in the organization and annotation of data
*Commonly constructed by copying parts of other (curated) databases
*Replacing traditional dictionaries, gazetteers, encyclopedias
*Rapidly increasing in scientific research (>800 in molecular biology)
*Catalogs and archival metadata are usually curated.
*Constantly checked/verified. Data quality and timeliness are important.
*Often a group efforts. Produced by a dedicated organization or as a collaboration.
*Labor intensive
*Increasingly seen as "publications" by scientists.
Compare traditional libraries vs. databases
Storage in libraries is: redundant, persistent, distributed, readable by people, clear standards for citation, historical record, well understood legal. Databases? Not.
Example of CIA world fact book and plotting the population of Lichtenstein in 1990. Are you better online or in library?
Research on Provenance:
Very difficult and long term problem
Database preservation:
How do you preserve something that evolves (both in content and structure)
Snapshots if frequent are time consuming, if not, you miss something.
Snapshots are immediate, and longitudinal/temporal queries are easy.
How do you cite something in a database? Many scientific databases ask you to cite them, but they
*don't tell you how, or
*they tell you to give the URL, or
*they tell you to cite a paper about the database.
What is a citation? Location and descriptive information .
Getting a canonical version of the database, borrowed data set from nearby scientist. The first task was to convert the db into a hierarchical structure. Preserve all versions of the data, generate stage web pages (less software, more efficient).
Able to clean-up the data.
Created a book from the database.
What is digital curation? A unified approach to
Preserving - the process of preserving digital data for future use, once it has been created
Creating/maintaining - missed last part of this second bullet...
Impertinent thoughts on a DC curriculum. Should they involve databases?
Yes, but they need not be intrusive or hugely time consuming. Teach data formats first and use that as an introduction. No need to teach internals or optimization. Teach design through data import/export technology and through schema mappings. Provide a short course in semistructured data and ontologies.
Do internships with data publishers (they need help!).
Other things to think about
*legal aspects (copying)
*security and confidentiality in databases, timed embargoes
*economics of long-term database maintenance, open access
*Combine with or borrow from, other curricula (ex. NSF data integration)
William Lefurgy from NDIIPP. Digital Curation and Sustainability
Will focus on economic sustainability because he thinks more curators need to pay more attention to that aspect of digital curation.
Aspects of Sustainability
*to keep in existence, to maintain or prolong
*meeting the needs of the present without compromising the ability of future generations to meet their own needs.
*resources, broadly defined, for keeping digital materials available and accessible over time (technology, staff, cash)
*concept application at different levels (national, consortial, local, etc.).
Wants to focus on the third bullet - resources. Brian Lavoie has pointed out that we need an economic infrastructure as well as a technological infrastructure. Example of LC's assumption that the money they had available would remain available. Not so. Their funding was revoked when Congress changed.
Sustainability closely linked with other key issues: collecting content, developing technology, and outlining public policy.
Preparing for substantial discussion of sustainability in the final report of NDIIPP.
Many projects are recognizing the issue. ARL "To stand the test of time" etc.
The need is clear
*expanding digital stewardship requirements
*infrastructure, capabilities still largely geared to analog
*digital funding is largely project based
*broad range of work necessary to effectively manage content across life cycle
*rapid change means regular migration of data, systems
Open questions
*how to preserve, make available
*how to transform existing stewardship organizations and practices
*what are the costs?
*who pays?
*why it matters
See
http://www.arl.org/bm-doc/econ_models.ppt for a "mind-map" of the economic issues
Basic action items
*make content value explicit
*probe business case elements
*explore business models
Content value/Why should I care?
*values of digital materials are typically intangible, as is the material itself
*funders need clear, concrete evidence for importance of digital content
*must clarify the demand side. what values accrue as a result of preservation? what are the deficits if the content goes away?
Content value clearly explained
*need frameworks to consider dimensions of value for various digital materials, e.g. value for institutional users, value for institutional reputation, prestige, value for posterity
see British Library "contingent valuation"
Business case: risks, fixes, costs
*compelling story about risk
*incentives/barriers
*plan for addressing risk
*some estimate of cost
*value added by the curatorial practices
The how and how much
*needed level of service, e.g. bag and tag, transformation, disaggregation, rich metadata
*prospective work flow
*credible cost estimates (see City College of London??)
Business models
*how to put preservation into operation
*provide for resources on an ongoing basis
*leverage incentives, remove barriers
*emergent models, but experimental at this point; modeling and testing appropriate
*collaboration is key
see LOCKSS/CLOCKSS and Portico for emerging models
Working within a network
*no one institution, community or sector can develop the best solution, collaboration is essential
*networks build shared infrastructure, reduce costs
*repositories will vary but all can draw from shared suite of tools, services, best practices.
Self-interest and the public good
*institutions work together in pursuit of individual net positive value
*key to sustainability: members get value from networks, but benefits accrue to all from exchange of knowledge
Summary of collective needs
*work to illuminate content value for decision makers
*make the case for specific curatorial actions with supporting cost data
*implement and test models
Labels: DigCCurr 2007
DigCCurr 2007: Identifying Digital Curation Services and Functional Requirements
I attended "Data Sets, Metadata, and Management"
Speakers: Jane Greenberg, Gail Steinhart, James Tuttle
Jane Greenberg. DRIADE project. Overview: introduction, consensus building, functional requirements,
DRIADE = Digital repository of Information and Data for Evolution.
Big science initiatives already with webbed interfaces to the data and lots of services built on top. The internet also impacts "small science" such as Knowledge Network for Biocomplexity (KNB), Marine Metadata Initiative (MMI).
Evolutionary biology has some requirements for data deposit (GenBank, TreeBase). Some publications also require supplementary data such as Molecular Biology and Evolution. No single one-stop shop for evolutionary biologist. DRIADE was developed in response to that need.
Goals of DRIADE
* develop one-stop shop for scientific data objects supporting published research
* support data acquisition, preservation, resource discovery, data sharing, and data reuse of heterogeneous digital datasets.
* balance a need for low barriers, with a higher-level data synthesis.
Consensus building
Had a stakeholders workshop where they invited reps from the major journals, organizations/societies, and scientists.
Outcomes: found there was unanimous support for the project, participants felt it was necessary to advance science in this field. It was agreed upon that they would have a central data repository. Journal representatives felt they had a moral obligation and moral authority to initiate this and get people on board. A data repository will help to verify authenticity of data and provide a bit of "policing" in terms of data security. It can also help with interoperability with Genbank and Treebase, working on "handshake activity."
Challenges: scope, representation, quality control, security, cultural change, sustainability.
Scope: should it be restricted to data supporting publications? Should other data be included? One participant advocated for doctoral theses. Who will contribute? Who will create the metadata? At what stage in the publication life cycle should data be contributed? How to coordinate with the journals? Rights are a very important issue. Rights of authors to their datasets. Do journals have rights to publish data collected with public funds? General consensus from the workshop leaning towards a model like Creative Commons. At what stage in reuse to re-users credit the creator of the data? Representation has many associated challenges. Standardization and strict adherence to standards vs. keywords. The idea of a combination of both is a practical way to approach it. Looking to generate as much metadata as possible automatically. Some experiments in drawing keywords from the published article. Quality control of the data input, how to maintain it? People at the workshop were very against the idea of a data curator initially, but by the end of the workshop they were thinking a data curator (or more than one) is necessary for quality control. Security is an issue because evolutionary biology is a controversial subject (think creationists). Need to protect. Sustainability? How do you encourage scientists to deposit data? DRIADE is fortunate in that there is buy-in from the journals. Some discussion of mandatory deposit policies leading to passive-aggressive behavior (like mislabeling tables, omitting data, etc.). There will be a need to "flag" incorrect items . Finally there is an issue with the ongoing funding model. Should it be subscription based? Grant funded?
Priorities and next steps: preservation, access, synthesis (Maslow's hierarchy of needs), cultural change via editorials, publicizing at conferences, requirements. Right now they consider the preservation itself to be the highest priority because there is a lot of data being lost.
Functional requirements: compared other small science projects.
Support: computer-aided metadata generation, specialized modules linking data submission to work flow.
Functional model based on OAIS.
Metadata framework.
Level 1 - initial repository implementation - preservation, access, and basic usage of data.
Level 2 - ??
Level 3 - ??
Application profiles: data elements drawn from one or more name space schemas combined together by implementors and optimized for a particular local application. Single existing schemes are often not sufficient.
Level 1+ bibliographic citation for the journal article, data object metadata (incl. PREMIS)
Level 3 brainstorming: thinking of web 2.0 technologies, personalization, macros, tagging etc.
Conclusions: Team work required. stakeholders meeting was critical. They are benefiting from prior work. Next steps will be to survey and do use-case and life-cycleycle studies. Metadata application profile experiment with evolutionary biologists actually using it.
Implications for education: student participate in the project, service learning is invaluable. Curriculum needs to address the whole picture including digital resource life-cycle, metadata life cycle, IA components, human factors. Language barriers and communication skills ...different vocabularies in different domains. Conferences like these.
Gail Steinhart described her work with a research group at Cornell.
Overview: motivation, strategy, and what we have learned so far.
Motivation: don't need to spend too much time with this audience explaining why digital curation is necessary. There are digital preservation issues and there is information entropy (loss of information about data over time which lessens its usefulness). Mentioned NSF Cyber-infrastructure Vision for the 21st century. Digital curators need to stay plugged into those. She attended the funders perspective concurrent session this morning and it is more and more evident that funders are going to require data curation.
Curation definition from DCC . She does not think that academic libraries are going to do all aspects of curation as per that definition (DCC definition includes processes needed for good data creation and management, and the capacity to add value to data to generate new sources of information and knowledge). Libraries should try to fill in gaps in infrastructure and support but this is not without costs. Engaging in this area will require libraries to develop new partnerships.
At Cornell project the departments and units involved are multiple (animal science, biological and environmental engineering, ecology, horticulture, Mann Library, etc.)
Types of data collected: observational data, experimental data, simulation models. The group also has 30 years of historical observational data that they would like to preserve and share.
They want to share (a) between themselves to create the simulation models (b) public good to share with policy makers and the general public (c) PI's have committed to being a model for shared collaboration. They appreciate the utility of their data for others.
Strategy:
Didn't make sense to develop infrastructure for this one small group. Needed local support of data, metadata.
Project participants share data and metadata via a staging repository provided by the library. When they are ready to make it public they have some choices: Cornell's institutional repository vs. discipline repositories.
Staging repository
*use discipline specific metadata standards and tools (Ecological Metadata Language [EML]), Morpho
*Provide a place to share pre-publication data within the group.
*Provide training and recommendations on metadata
EML
Morpho is a metadata editor that makes it pretty easy for P.I.'s to create metadata. Helps non-librarians to create a metadata record. Metacat?
"Publication" of data options
*DSpace/Cornell institutional repository... but this doesn't add to the science infrastructure so they encourage
*submit metadata (and possibly data) to discipline specific repository (KNB, other?)
Test case: Historical
Observational data from past 30 years
Original format Quattro Pro workbooks with multiple pages.
Converted to Excel for further review and clean-up
Various errors (apparent duplicate records, misaligned columns, out of range values)
Missing or ambiguous information (methods, units, geographic locations)
Extensible model??
Lots of work between the data owner and her. It raises the question of if this level of service can be provided by libraries.
Summary: curation skills
*traditional library and archiving skills (metadata, preservation, interoperability, appraisal, and selection)
*understanding content area ... need to understand how researchers in a discipline really do research.
*awareness of standards and tools related to data.
*productive partnerships with researchers.
Question: how do contributors work with controlled vocabulary? Answer: haven't had to face it yet. It's too new for the researchers. Had to begin with teaching them what metadata is, what EML is etc. EML doesn't require controlled vocab but can accommodate it. She suspects that the reality is that they're going to do what they are going to do and we have to accept it because it is better than not getting the data sets at all.
James Tuttle, NCSU, Curation and preservation of complex data NC Geospatial Data Archiving Project
In his experience geospatial researchers value the newest data and ignore older data when newer data becomes available.
Geospatial data types are complicated. Vector data, for example. It's highly difficult to preserve. Aerial imagery is a little simpler. Spatial databases are incredibly difficult to preserve. Can do one-many export of images and vector data but trying to manage the relationships between them is difficult.
Repository Pre-ingenst work flow
Data receipt - format processing - metadata processing - ingest processes. The process is as much social as technical.
Data receipt: includes acquisition, reorganization, validation, threat analysis, inventory. Try to automate where ever possible. They have no demands on contributors, so files come to them "as is."
Using JHOVE tool to harvest, even though not designed for geospatial data.
Format processing: geospatial data traditionally has not migrated well between formats. Processing includes conversion, compound formats. Typically data is gathered without metadata. Some metadata can be generated from the GIS software. When it exists it typically requires remediation to be used in basic retrieval.
Ingest Processes: metadata conversion, SIP creation.
Extended Curation: feedback loop with contributors, constant improvement on metadata being received. Also work with industry and standards organization (geospatial consortium)
http://www.lib.ncsu.edu/ncgdap/Question: Have you met with resistance to depositing from contributors?
Answer: Some agencies in N.C. had strict data sharing requirements which impeded our ability to use the data. A lot of local agencies have liability concerns with older data which may be obsolete, superseded, etc. Need to have clear disclaimers. Have to work with providers to reassure them.
General questions:
Comment: general theme of complex/compound objects in the papers, but the presentations bring home the need for people skills. How to talk to people who don't use our language re: metadata? How do you develop that skill set in students so they can speak to researchers about their data sets? All three of the presentations mentioned an education component where they had to explain to researchers about metadata and its function. How can curators be good educators?
Question: In terms of preparing digital curators what level of expertise do we expect them to have in different content areas?
Jane Greenberg: It's a very interesting question. They have discipline post-doc working on their project. She recalls a time as cataloger where colleagues had graduate degrees.
Audience member: when it's a general description of blob object then it's not as important to have the discipline knowledge, but it's more important to know what's actually in the data.
Jane Greenberg - important to teach curators to know when they need a domain expert so they don't get themselves into situations where they aren't qualified.
Labels: DigCCurr 2007
DigCCurr 2007: What do digital curators need to know?
Each concurrent session time has a different theme. The theme of session one is "What do digital curators do and what do they need to know? I went to the "Research Perspectives" session. I was a bit disappointed because I was expecting to hear about hot research questions but the discussion was interesting nonetheless.
Speakers: Hans Hoffmann, Phil Eppard, David Giaretta.
The speakers described their associated projects. Hoffman is involved with the Planets project, Eppard described the work of InterPARES 1 and 2. Giaretta, manager of the CASPAR project, spoke more of what digital curators need to know.
Most of the detail that Hoffmann and Eppard discussed I'm familiar with from reading about the projects over the years.
Giaretta was incredibly amusing. What follows are my "raw dump" notes. I'll have to summarize and comment at some point, but fwiw, here are my notes.
Concurrent Session - Funder's perspective
Hans Hoffman. Described the PLANETS projects. An European initiative.
Components: planning services, characterization services.
Interdependencies between all of the components.
Preservation planning. Come up with a process to identify what should be done with the digital object's for which you are responsible. Criteria for preservation based upon organizational policies, collection profile, provenance of digital objects (authenticity).
What are the best available preservation action given the criteria? Develop a plan. Ideal is to make it an automated process. Requirements should be proactive rather than reactive.
Preservation policy, content profile, usage profile, and actions inform the plan.
Plan will be executed on the content of your repository.
Characterization of objects can take two approaches. Intellectual approach, building objectives trees based upon utility analysis and extraction of intrinsic file (format) information.
They are trying to develop a description language to match the two approaches.
TNA PRONOM file-format identification used to define a characteristic language, define an extraction language, define a pluggable interpreter.
Preservation actions: two approaches: transform content/objects and transform environments (migration, emulation). Content objects: wrap third party transformation tools, ...preserve relational databases.
Testbed environment will help them determine what works. Developing a corpora of objects. Performing experiments on it.
The testbed consists of: data storage, hardware, PLANETS software, testbed software...
Interoperability framework.
What do digital curators need to know? Preservation planning, how to identify what criteria should inform decisions, how to apply that criteria to digital objects, how to test and evaluate available preservation strategies with respect to a given type of objects. How to do it in a effective and efficient way.
Training programs based on Planets results. Coming up with a modular approach to bring together course materials already in existence and building on the work of ERPAnet?
http://www. planets-project.euQuestions: More about the criteria for judging which approach is best? You need to know what your collection is about, what are the characteristics of the digital objects? When you use migration, for instance, will it be the best solution? Emulation? The authenticity requirements for the document or record are needed and they are based on the business requirements of the collection and the context of creation.
Could you envision a situation where the requirements would come from the context of use? Hoffmann: Yes, that's why we're doing user studies. How are they using the digital objects? What will it tell us about the need to preserve digital objects.
If you have multiple users going after the objects in different ways would there be different criteria for different needs? Hans Hoffmann -- you have to deal with the object how you receive it. The object and how you use it are two different things. Add services to the repository based on the use. You try and evaluate and revise.
Phil Eppard on InterPARES 2 Project.
Many InterPARES researchers here in the room. Investigating the complex issues in the preservation of digital materials. InterPARES has a very long history. PE provided an overview of history and scope and some of the InterPARES products.
Started at UBC with authenticity of records project 1994-97, concerned with creation and maintenance of records in their active phase. Product of that research was DoD electronic records standard.
1999-2001 InterPARES 1. 13 countries, 4 continents, 60 researchers. Included practioners and experts in c.s., law, and policy studies. Focus was on records as defined by archival science. Theoretical principles based on archival theory and diplomatics (the study of creating and identifying authentic records).
Used case studies. Through the case studies used a template for analysis developed via diplomatics. Key product was two sets of activity models for the functions of selection and preservation functions and a framework for assessing and maintaining authenticity. Benchmark requirements supporting the presumption of authenticity and baseline requirements supporting the production of authentic copies of electronic records.
Not preserving the records themselves so much but the ability to reproduce the records in an authentic form.
Benchmark requirements: maintain expression of record attributes relating to identity and integrity, control access privileges, protective procedures to prevent loss of corruption of records, procedures to prevent media degradation, procedures for maintaining documentation.
A preserver looking to take over a set of electronic records would test them against the benchmark and this may influence an appraisal decision.
Baseline requirements: maintain controls over records transfer, maintenance and reproduction, retain documentation of reproduction process and its effects, capture ...?
2002-2006 InterPARES 2
Expanded interdisciplinary team adding researchers from various sectors of the arts and sciences to the team of archivists, preservationists, etc.
Focused on newer types of electronic records: dynamic, interactive, experiential.
Develop understanding of their creation, maintenance, and preservation.
Research domains: records creation & maintenance, authenticity, accuracy and reliability & methods of appraisal and preservation.
Focus areas: arts activities, scientific research activities, and e-government.
Cross-domain research groups: description (metadata), modeling, policy, and terminology.
Created a dictionary of terminology available to the public as a database.
Key products: manage chain of preservation model (preserver centered), business driven record keeping model (records creators, business centered), principles for records creators and preservers (for policy development rather than principles of preservation), guidelines for digital records preservation (operationalizing process for practitioners), guidelines for individuals, Metadata and Archival Description Registry and Analysis System (MADRAS), terminology database.
MADRAS is a key product. A web-based tool for developing registering and evaluation metadata schemas and archival description standards. It allows people to compare schemas as to how well they meet international standards and guidelines (such as the benchmark requirements).
InterPARES and Digital Curation: training new researchers and educators, case study methodology and examples, integrating preservation with other processes, metadata schema and analysis, policy recommendations.
Question: will you offer counseling to universities who want to use your methodologies?
PE: InterPARES 3 selected effort to work directly with repositories to test and implement some of the products of previous InterPARES work.
David Giaretta, CASPAR Project manager
CASPAR = Cultural, Artistic, and Scientific knowledge for Preservation, Access, and Retrieval.
What digital curators do: Struggle with: funders (reluctant to provide long-term commitment; cost control, cost estimates), Information provides (unwilling to provide what is needed, ways to capture required info), Users (increasingly demanding).
CASPAR a large consortium
http://www.casparpreserves.eu What do digital curators need to know? They do preservation and publication/access but do not confuse them.
Needs of access: responsive, sophisticated search techniques, users often familiar with the material.
Needs of preservation: ensure the information trapped in the bits is authentic and understandable -- to the designated community (this also implies making it fit for the purpose, adding the info).
Disincentives for preservation: Cost, Time.
Can sell preservation as benefiting access. Cyber-infrastructure allow users to find and try to use data from many sources. Some of these will be familiar but most will be unfamiliar. How can one be sure that the unfamiliar data is used correctly?
Need understanding: garbage in, garbage out.
Digital preservation is terribly easy to do.... as long as you can provide money forever. Easy to test claims about tools...as long as you live a long time.
Know what is being preserved: the great data/document divide. Need to preserve information & knowledge -- not just "the bits." Documents, videos are rendered -- simple? Data must be processed in new ways ... this is harder.
Information is the important thing. What information? documents, data. Original bits? Look and feel? Behavior? Performance? Explicit/Implicit/Tacit.
Things change/disappear -- how can we ensure that the information trapped in the "bits" remains understandable despite all these changes? Example of Google changing a style sheet and messing up the RSS. The network links to related information may be important.
Time is short. Neither you or your institution will last forever. The chain of preservation is only as strong as its weakest link. Need to be prepared to hand over responsibility for the preservation.
No repository is an island. Your organization can not do everything. Must tap into other resources -- how can we find them and evaluate those resources.
We can not foretell the future. Need to manage knowledge to keep archives alive thorough time. Preservation is a process not a one time event. Preservation is expensive.
OAIS. Know more than the functional model diagram. The information model is key. With data especially you need to know the semantics (context).
Authenticity - evidence, evidence, evidence.
Support infrastructure: registries of representation information, representation information gap manager, orchestration manager, toolkits (representation information; preservation description information).
CASPAR aims to produce tools and techniques to support digital preservation and make it easier to share the cost. Must be relatively easy to use, must have a low "buy-in" in terms of effort required for adoption, must avoid requiring wholesale change of everyone else's systems, must be decentralized and reproducible so that it can live.
How can you tell you is selling preservation snake oil?
How to decide? Validation: demonstrate theoretical basis. Accelerated lifetime tests (changes in hardware, environment, and changes in designated community). Demonstrate increased trustworthiness, measured using Certification process as/when available.
http://wiki.digitalrepositoryauditandcertification.org (NARA work to produce ISO standard development)
Question:
One problem with OAIS is defining the designated community. What do you do when your archive, under law, has to serve everybody? Answer: State assumptions of what your community should already know in order to use .
Anne Gilliland asked what type of skills they expect people taking these positions to have? Eppard: Management skills and people skills. Gilliland: it goes back to developing the curriculum.
Marchionni (from his notes)- people need to know about the different models of preservation, need to know about their communities and how to monitor changes within it, know about appraisal as a continuous process rather than a discrete event, know about decision making process and they need to know about fund raising
Hoffmann - It's context related. I work in archives. Libraries may require a different understanding. Tools are applied differently in different contexts.
Labels: DigCCurr 2007
DigCCurr 2007: Opening plenary
Helen Tibbo informed us of the proper pronunciation of the conference name. It's DIGH-seek-er. That's a relief. Now I won't embarrass myself because I had no idea how the cool kids were saying it. Attendance here has exceeded all expectations. There are 280 attendees and the organizers were only expecting 100 or so initially. The plenary room is packed and we have an overflow room. There is a conference wiki where the conference can be live blogged and chatted.
www.ibiblio.org/jewel/digccurr2007/pmwiki/Labels: DigCCurr 2007
I need a new motherboard for my laptop. I use a Dell Latitude D610 and its...a Dell...
I think universities get good contracts for Dells. I've used one for the past ten years and I've never been particularly impressed by the performance.
I've had this baby for 2 or 3 years now. It has never docked properly into it's desktop station. The screen configuration settings are supposed to switch between the desktop station monitor and the screen on the laptop. Never.worked.
We tried many things and even called Dell, but for the past two years I've been manually switching the configuration settings every time I docked. Annoying, but live-able. Now the laptop won't dock at all. It won't recognize the power supply when it's input through the docking station.
Dell says it's cause I need a new motherboard.
Bad timing.
I leave for N.C. and DigCCurr tomorrow and I was intending to blog the sessions I attend. I have to leave the laptop at work so the Dell guy can arrive "either today or tomorrow" (shall I hold my breath?). I will probably take M's PowerBook so I won't be computer-less.
I am decidedly NOT a Mac person. I know, I know. All the cool kids love their Macs. I'm one of those weird people who have a hard time switching back and forth. I have problems telling my left from my right so it does mess me up to have the windows buttons on the opposite side.
I will still be blogging, it just may be slower than I intended.
Labels: DigCCurr 2007
I'm thrilled to announce that I have accepted the Metadata Services Manager position at the California Institute of Technology.
There are many reasons to be excited. First, Caltech Library is innovative. Second, I will be reporting to Eric Van de Velde. I greatly respect Eric and feel we'll be able to successfully tackle the challenges facing library systems and technical services.
Third, Caltech was one of the first libraries in the country to create repositories and they have a very successful and active repository program. I will not be directly involved with the repositories, at least initially. My job will be to reposition the cataloging department into a Metadata Services department. It's an open question as to how a Metadata Services department evolves and develops to best fulfill the Library's mission. Metadata Services are integral to repositories so I can forsee some involvement in the future.
Labels: Caltech, MPOW
As I am reading the new draft ch.3 of RDA, all I can think is, "how the $*%& am I going to train people how to use this thing?"
And I love metadata. Picture how it would read to somebody who doesn't thrill to the notion of cataloging. Picture how reading it will feel to a new hire in a formerly-known-as-cataloging department or a library-school student. I have to confess, I skimmed the AACR2 when I was in Gloria Leckie's kick-ass cataloging class back in my lib-school days. It really is meant to be a reference book digested in wee pieces. I'm not suggesting that newbies read it wholesale like the current reviewers are doing with RDA. Yet one needs a mental model of what the whole "book" is about in order to understand how to use it. At least for me. I'm a visual thinker.
If any a text required a visualization, the
AACR2 AACR3 RDA does. It's difficult for me to digest the many and varied connections between RDA and other standards. I'm constantly flipping back and forth between FRBR, ISBD, FRAR, FRAD, etc. I'm glad I can print them out at work and I don't have to spend for the printer ink on my own dime. And don't even get me started about carrying them to ALA for the CC:DA meetings.
I do have thoughts on what I've read of the rev.Ch 3 so far. Oh yes indeedy do. I need to clean them up and clarify a few things for myself before I comment publicly. Mostly I want to get caught up with NGC4LB and RDA-L and make sure I add value to the discourse
Labels: metadata, RDA
An
early release of Sophie is available. Shout outs to Karen at Free Range Librarian for bringing it to my awareness. Now I have to take action on
my book rant. I shall post an invitation for IR managers to play with Sophie along with me once I've got it installed and running and networked somewhere.
I haven't been reading feeds for the past few days (life trumps blogging..LTB). I've probably got a dozen announcements in my aggregator but hey, I read Karen first. If one has to prioritize feeds, Karen's is the creme de la creme.
I will attempt installation on both Windows and Linux by this time next week (If I blog it, may I hold myself to it. Beats reading RDA...)
I still haven't played with Archivists Toolkit either. I've got many good reasons for this lack of free time for library tech playing but I can't yet divulge. Rest assured it's all interesting and good.
Labels: book, sophie
b.o.o.k. & RDA
I've ranted about
the notion of a book on Institutional Repositories. Since writing that rant, I've had some publishers contact me to elucidate the advantages of using a professional publisher, namely: a close read and suggestions for revisions, publicity, and experience with distribution.
Of course I'm not dissing publishers and editors. I recognize the value they bring to the publication process. The point of the rant is my opinion that literate people will need to radically reconceptualize our collective notion of the book in order to make full use of books of the future. For librarians, this should go hand in hand with our use of FRBR and RDA.
I've been procrastinating about reading the recent release of its draft chapter number three.
Even though I'm a trained cataloger, I still struggle with catalogerese. And it's not the most scintillating of reads after a long days work. I'm purposefully avoiding the RDA discussion list and the NextGenCatalog space, just so that I can form my own opinions while I read it.
I'm also beginning task force work for CC:DA on internal and external communication. It should be interesting in this time of flux to take another look at that.
For what its worth, I'm firmly in the Coyle/Hillman/Weiss train of thought when it comes to all things RDA. They state the issues far more eloquently than I could. Once I've got the draft chapter under my belt, I'll write out my thoughts.
Labels: book, metadata, RDA
I extend a hearty congratulations to my former colleagues at
UCSB's Map and Imaging Library on being named in the top 10 Models of Technology Innovation according to a survey done by the ACRLog bloggers.
It was an honor and a privilege working with you on the ADEPT educational adaptation of the ADL. Larry Carver, visionary, Mary Laarsgard, map cataloging guru, Dave Valentine, programmer, Linda Hill geo-spatial indexing specialist, Greg Janee programmer, and of course Terry Smith,Jim Frew,Chris Borgman, and all of the research PI's. I'm sure I'm forgeting others. All of the staff of ADL are very deserving of this recognition.
If you haven't had a chance to play with the
Alexandra Digital Library project, I highly encourage you to take a look. It's beyond super keen-o.
http://diva.sfsu.edu/help/aboutI suppose it's natural that I like something named diva. Deja vu or something.
From the about page:
DIVA is the Digital Information Virtual Archive, a web-based file management solution for use by faculty and researchers in higher education. DIVA provides always-on storage, organization, recovery, version-tracking, and sharing/dissemination capabilities for campuses across the CSU.
Way cool. It really warms the cockles of my heart because I used to work directly on stuff like this.
Labels: Convergence, DIVA, MPOW
I just finalized my travel details. I'm heading to
DigCCurr 2007 rather than going to Computers in Libraries. It was a tough choice. I went with DigCCurr because it's about training people for digital curation. The cataloging departments of yore no longer serve our purposes, IMHO. I'm currently charged with thinking about new directions for the bibliographic access services department at MPOW. I suspect that the description and preservation of locally produced resources will be our focus as we move to shelf-ready monographs. DigCCurr is simply more relvant to me at this point.
Transforming cataloging departments is difficult. The cataloging departments I've observed (including MPOW) are filled with legacy staff who've worked for the institution for 10+ years. The staff are incredibly process-driven. Without step-by-step procedures they can feel lost. They are taught not to think for themselves. This is the complete opposite mind-set from what's required to build new services.
You could fire people or lay them off, but that's a morally icky choice as far as I'm concerned. You can transfer them to different departments, if those opportunities exist. Otherwise you need to reframe and retrain. Breaking that I-must-ask-my-superviser-about-every-step culture won't be easy. I think many cataloging departments avoid the problem simply because the human resource issues are so overwhelming. I probably need to think about this more carefully rather than making from-the-hip judgements like those above. I do believe that a radical overhaul of how cataloging departments do business is necessary, I just haven't articulated the arguments too well.
I've been catching up on work-work since giving my presentation to the Cal State University senior research officers last Friday. I finally managed to put the presentation up at my work web site. I present for your reading pleasure:
"
Supporting scholarship: issues, opportunities, and service development" Feel free to steal whatever you'd like from it. I know I borrowed liberally from the papers and news I've been reading lately (with attribution in the notes).
To provide context: The CSU research officers are provosts and vice-presidents.
They recently drafted a white paper about the role of research within the Cal State system. The gist of it is that they want to promote research more since it's integral to our mission as a teaching/learning centered institution. I gave them a lot of feedback on their draft document. The officers know there are impediments to expanding research and they are trying to address those issues. Go them!
I'm sorry the presentation is in PowerPoint. I know I said I was going to take a more demonstration-based approach. As I got into writing the presentation I realized there was a metric ton of material to discuss which necessitated the slide-based approach. I'd like to get into using slidy or something more webbed. It's a matter of balance. Do I want to spend time on the newer applications when the old familiar software does the job adequately? Yes, but I don't get to dictate what my priorities are on any given day. This week I'm dealing with some campus committee deadlines. Ultimately this campus committee work assists me in marketing and filling my repository so the effort expended is worthwhile. I'm sure that some day I'll join the cool-kids in eschewing PowerPoint.
I've been playing with
Macromedia Adobe Captivate with an eye towards putting the presentation into an audio/video format. No promises as to when I'll get to that.
The headline says it all. The California Faculty Association membership has voted. 94% of voters said yes to taking strike action should we fail to get a settlement (81% turn out). The news blackout on the fact-findings will end March 26, I believe so the rolling strikes are likely to happen in April.
that 5 nonLIS blogs meme
I just checked my aggregator. I'm kind of surprised to find that I have no feed which could be considered non-LIS related. Dang. I need to get a life.
I'm trying to decide if I'll attend Computers in Libraries next month. It's expensive. I know it's free if you present, but I haven't felt like any of my work has been worth presenting. The repository I'm shepherding is moving along rather slowly since I'm the only person working on it. I should have something worthy of sharing by fall (keeping my fingers crossed).
I have some air miles, so the flight would be free. I can stay with my friend B. and get a nice visit in as a bonus. I still can't convince myself that I can afford the registration. I want to attend mostly for the networking opportunities. I haven't hob-nobbed with the blog people for awhile. When I go to ALA I've been too busy with RDA meetings.
I should throw caution to the wind -- it's only money after all. I did submit a proposal for Internet Librarian. I don't want to get caught in that pay-for-admission thing again. I make a good living but I do not receive support for my professional travel beyond release time.
Labels: CIL2007, Computers in Libraries
I continue to ponder presentation outlines for the research officers talk. I'm presenting at the end of the day, at the end of the week. Obviously I'll need to hold their attention.
I get cheeky ideas. I imagine myself saying things about the pending strike at MPOW and how the best way to support research within this particular university environment is to ensure that your faculty can live comfortably and ensure that real learning -- as in teaching for transfer--is taking place. It would probably be too impolitic to bring up labor issues when I'm supposed to be speaking about research.
I'll be open, I support the union fully as a matter of principle and religion. I voted to strike. It's a complicated situation, far too nuanced for this wee post. I'm ok with whatever happens. It is what it is.
The situation juxtaposes with another situation -- I have an opportunity to educate people in positions of influence. My heart is with the union rant, but my head is with my must-deal-with-digital scholarship mission. I'm a journalism grad. I know when to stay on message.
I've got the outline of my talk completed. Now is the time to flesh it out with facts, metaphors, illustrations.
I intend to take a good look at the JISC funded CD-LOR project. (community dimensions of learning object repositories). The Brits are light-years ahead of us when it comes to IR work. The JISC Funded CD-LOR project has been identifying and analysing the factors that influence practical uptake and implementation of learning object (LO) repositories within a range of different learning communities.
They have drafted some guidelines on e-Learning and Repostitories and they're searching for feed-back. Check out the draft at
http://www.academy.gcal.ac.uk/cd-lor/DraftStructuredGuidelines.pdf and provide your feedback by 3/15 if you're a Brit.