Showing posts with label PhD. Show all posts
Showing posts with label PhD. Show all posts

Wednesday, September 21, 2016

2016-09-20: The promising scene at the end of Ph.D. trail

From right to left, Dr. Nelson (my advisor),
Yousof (my son), Yasmin (myself), Ahmed (my husband)
August 26th marked my last day as a Ph.D. student in the Computer Science department at ODU, while September 26 marks my first day as a Postdoctoral Scholar in Data Curation for the Sciences and Social Sciences at UC Berkeley. I will lead research in the areas of software curation, data science, and digital research methods. I will be honored to work under the supervision of Dr. Erik Mitchell, the Associate University Librarian and Director of Digital Initiatives and Collaborative Services at the University of California, Berkeley. I will have an opportunity to collaborate with many institutions across UC Berkeley, including the Berkeley Institute for Data Science (BIDS) research unit. It is amazing to see the light at the end of the long tunnel. Below, I talk about the long trail I took to reach my academic dream position. I'll recap the topic of my dissertation, then I'll summarize lessons learned at the end.

I started my Ph.D. in January 2011 at the same time that the uprisings of the Jan 25 Egyptian Revolution began. I was witnessing what was happening in Egypt while I was in Norfolk, Virginia. I could not do anything during the 18 days except watch all the news and social media channels, witnessing the events. I wished that my son Yousof, who was less than 2 years old at that time, could know what was happening as I saw it. Luckily, I knew about Archive-It, a subscription service by the Internet Archive that allows institutions to develop, curate, and preserve collections of Web resources. Each collection in Archive-It has two dimensions: time and URI. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand.


There are multiple collections in Archive-It about the Jan. 25 Egyptian Revolution 

There is more than collection documenting the Arab Spring and particularly the Egyptian Revolution. Documenting long-running events such as the Egyptian Revolution results in large collections that have 1000s of URIs and each URI has 1000s of copies through time. It is challenging for my son to pick a specific collection to know the key events of the Egyptian revolution. The topic of my dissertation, which was entitled "Using Web Archives to Enrich the Live Web Experience Through Storytelling", focused on understanding the holdings of the archived collections.
Inspired by “It was a dark and stormy night”, a well-known storytelling trope: https://en.wikipedia.org/wiki/It_was_a_dark_and_stormy_night/  
We named the proposed framework the Dark and Stormy Archive (DSA) framework, in which we integrate “storytelling” social media and Web archives. In the DSA framework, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify. An example of the output is bellow. It shows three stories for the three collections about the Egyptian Revolution. The user can gain an understanding about the holdings of each collection from the snippets of each story.


The story of the Arab Spring Collection

The story of  the North Africa and the Middle East collection


The story of the Egyptian Revolution collection



With the help of Archive-It team and partners, we obtained a ground truth data set for evaluating the generated stories by the DSA framework. We used Amazon Mechanical Turk to evaluate the automatically generated stories against the stories that were created by domain experts. The results show that the automatically generated stories by the DSA are indistinguishable from those created by human subject domain experts, while at the same time both kinds of stories (automatic and human) are easily distinguished from randomly generated stories. I successfully defended my Ph.D. dissertation on 06/16/2016.




Generating persistent stories from themed archived collections will ensure that future generations will be able to browse the past easily. I’m glad that Yousof and future generations will be able to browse and understand the past easily through generated stories that summarize the holding of the archived collections.

Resources:

To continue WS-DLer’s habit in providing recaps, lessons learned, and recommendations, I will list some of the lessons learned for what it takes to be a successful Ph.D. student and advice for applying in academia. I hope these lessons and advice will be useful for future WS-DLers and grad students. Lessons learned and advice:
  • The first one  and the one I always put in front of me: You can do ANYTHING!!

  • Getting involved in communities in addition to your academic life is useful in many ways. I have participated in many women in technology communities such as the Anita Borg Institute and the Arab Women in Computing (ArabWIC) to increase the inclusion of women in technology. I was awarded travel scholarships to attend several well-known women in tech conferences: CRA-W (Graduate Cohort 2013), Grace Hopper Celebration of Women in Computing (GHC) 2013, GHC 2014, GHC 2015, and ArabWIC 2015. I am a member of the leadership committee of ArabWIC. Attending these meetings grows maturity and enlarges personal connections and development that prepare students for future careers. I also gained leadership skills from being part of the leadership committee of ArabWIC. 
  • Publications matter! if you are in WS-DL, you will have to get the targeted score 😉. You can know more about the point system on the wiki. If you plan to apply in academia, the list of publication is a big issue. 
  • Teaching is important for applying in academia. 
  • Collaboration is a key for increasing your connections and also will help in developing your skills for working in teams. 
  • And at last, being a mom holding a Ph.D. is not easy at all!!
The trail was not easy, but it is worth it. I learned and have changed much since I started the program. Having enthusiastic and great advisors like Dr. Nelson and Dr. Weigle is a huge support that results in happy ending and achievement to be proud of.

-------
Yasmin


Friday, April 15, 2016

2016-04-15: How I learned not to work full-time and get a PhD

ODU's commencement on May 7th marks the last day of my academic career as a student. I began my career at ODU in the Fall of 2004, graduated with my BS in CS in the Spring of 2008 at which point I immediately began my Master's work under Dr. Levinstein. I completed my MS in Spring 2010, spent the summer with June Wright (now June Brunelle), and started my Ph.D. under Dr. Nelson in the Fall of 2010 (which is referred to as the Great Bait-and-Switch in our family). I will finish in the Spring of 2016 only to return as an adjunct instruction teaching CS418/518 at ODU in the Fall of 2016.


On February 5th, I defended my dissertation "Scripts in a Frame: A Framework for Archiving Deferred Representations" (above picture courtesy Dr. Danette Allen, video courtesy of Mat Kelly). My research in the WS-DL group focused on understanding, measuring, and mitigating the impacts of client-side technologies like JavaScript on the archives. In short, we showed that JavaScript causes missing embedded resources in mementos, leading to lower quality mementos (according to web user assessment). We designed a framework that uses headless browsing in combination with archival crawling tools to mitigate the detrimental impact of JavaScript. This framework crawls more slowly but more thoroughly than Heritrix and will result in higher quality mementos. Further, if the framework interacts with the representations (e.g., click buttons, scroll, mouseover), we add even more embedded resources to our crawl frontier, 92% of which are not archived.


Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations from Justin Brunelle

En route to these findings, we demonstrated the impact of JavaScript on mementos with our now-[in]famous CNN Presidential Debate example, defined the terms deferred representations to refer to representations dependent upon JavaScript to load embedded resources, descendants to refer to client-side states reached through the execution of client-side events, and published papers and articles on our findings (including Best Student Paper at DL2014 and Best Poster at JCDL2015).


At the end of WS-DLer academic tenures, it is customary to provide lessons learned, recommendations, and recaps of their academic experiences useful to future WS-DLers and grad students. Rather than recap the work that we have documented in published papers, I will echo some of my advice and lessons learned for what it takes to be a successful Ph.D. student.

Primarily, I learned that working while pursuing a Ph.D. is a bad idea. I worked at The MITRE Corporation throughout my doctoral studies. It took a massive amount of discipline, a massive amount of sacrifice (from myself, friends, and family), a forfeiture of any and all free time and sleep, and a near-lethal amount of coffee. Unless a student's "day job" aligns or overlaps significantly with her doctoral studies (I got close, but no cigar), I strongly recommend against doing this.

I learned that a robust support system (family, friends, advisor, etc.) is essential to being a successful graduate student. I am lucky that June is patient and tolerant of my late nights and irritability during paper season, my family supported my sacrifices and picked up the proverbial slack when I was at conferences or working late, and that Dr. Nelson dedicates an exceptional portion of his time to his students. (Did I say that just like you scripted, Dr. Nelson?) I learned to challenge myself and ignore the impostor syndrome.

I learned that a Ph.D. is life-consuming, demanding of 110% of a student's attention, and hard -- despite evidence to the contrary (i.e., they let me graduate) -- they don't give these things away. I also learned about what real, capital-R "Research" involves, how to do it, and the impact that it has. This is a lesson that I am applying to my day job and current endeavors.

I learned to network. While I don't subscribe to the adage "It's not what you know, it's who you know", I will say that knowing people makes things much easier, more valuable, more impactful, and essential to success. However, if you don't know the "what", knowing the "who" is useless.

I learned that not all Ford muscle cars are Mustangs (even though they are clearly the best), that it's best to root for VT athletics (or at least pretend), that I am terrible at commas, and that giving your advisors homebrew with your in-review paper submissions certainly can't hurt; the best collaborations and brainstorming sessions often happen outside of the office and over a cup of coffee or a pint of beer.

Finally, I learned that finishing my Ph.D. before my son arrived was one of the best things I've done -- even if mostly by luck and divine intervention. I have thoroughly enjoyed spending the energy previously dedicated to staying up late, writing papers, and pounding my head against my keyboard to spending time with June, Brayden, and my family.

Despite these hard lessons and a difficult ~5 years, pursuing a doctorate has been a great experience and well worth the hard work. I look forward to continued involvement with the WS-DL group, ODU, my dissertation committee, and sharing my many lessons learned with future students.


--Dr. Justin F. Brunelle

Tuesday, September 1, 2015

2015-09-01: From Student To Researcher II










After successfully defending my Master's Thesis, I accepted a position as a Graduate Research Assistant at Los Alamos National Laboratory (LANL) Library's Digital Library Research and Prototyping Team.  I now work directly for Herbert Van de Sompel, in collaboration with my advisor, Michael Nelson.

Up to this point, I worked for years as a software engineer, but then re-entered academia in 2010 to finish my Master's Degree.  I originally just wanted to be able to apply for jobs that required Master's Degrees in Computer Science, but during my time working on my thesis, I discovered that I had more of a passion for the research than I had expected, so I became a PhD student in Computer Science at Old Dominion University.  During the time of my Master's Degree, I had taken coursework that counts toward my PhD, so I am free to accept this current extended internship while I complete my PhD dissertation.

LANL is a fascinating place to work.  In my first week, we learned all about safety and security. We learned not only about office safety (don't trip over cables), but also nuclear and industrial safety (don't eat the radioactive material).  This was in preparation for the possibility that we might actually encounter an environment where these dangers existed. One of the more entertaining parts of the training was being aware of wildlife and natural dangers, such as rattlesnakes, falling rocks, flash floods, and tornadoes.  We also learned about the more mundane concepts like how to use our security badge and how to fill out timesheets.  I was fortunate to meet people from a variety of different disciplines.


We have nice, powerful computing equipment, good systems support, excellent collaboration areas, and very helpful staff. Everyone has been very conscientious and supportive as I have acquired access rights and equipment.

By the end of my first week, I had begun working with the Prototyping Team.  They shared their existing work with me, educating me on back-end technical aspects of Memento as well as other projects, such as Hiberlink. My team members Harihar Shankar and Lyudmila Balakireva have been nice enough to answer questions about the research work as well as general LANL processes.

I am already doing literature reviews and writing code for upcoming research projects.  We just released a new Python Memento Client library in collaboration with Wikipedia. I am also evaluating Jupyter for use in future data analysis and code collection.  I have learned so much in my first month here!

I know my friends and family miss me back in Virginia, but this time spent among some of the best and brightest in the world is already shaping up to be an enjoyable and exciting experience.

--Shawn M. Jones, Graduate Research Assistant, Los Alamos National Laboratory

Thursday, September 18, 2014

2014-09-18: A tale of two questions

(with apologies to Charles Dickens, Robert Frost, and Dr. Seuss)


"It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, ..." (A Tale of Two Cities, by Charles Dickens).

At the end of this part of my journey; it is time to reflect on how I got here, and what the future may hold.

Looking back, I am here because of answering two simple questions.  One from a man who is no longer here, one from a man who still poses new and interesting questions.  Along the way, I've formed a few questions of my own.

The first question was posed by my paternal uncle, Bertram Winston.  Uncle Bert was a classic type A personality.  Everything in his life was organized and regimented.  When planning a road trip across the US, he would hand write the daily itinerary.  When to leave a specific hotel, how many miles to the next hotel,
Uncle Bert and Aunt Artie
phone numbers along the way, people to visit in each city, and sites to see.  He would snail-mail a copy of the itinerary to each friend along way, so they would know when to expect he and Aunt Artie to arrive (and to depart).  He did this all before MapQuest and Google maps.  He did all of this without a computer, using paper maps and AAA tour books. 

Bert took this attention to detail to the final phase of his life.  As he made preparations for his end, he went through their house and boxed up pictures and mementos for friends and family.  These boxes would arrive unannounced, and were full of treasures.  After receiving, opening, sharing these detritus with Mary and our son Lane, I thanked Bert for helping to answer some of the questions that had plagued me since I was a child.  During the conversation, he posed the first question to me.  Bert said that he had been through his house many times and still had lots of stuff left that he didn't know what to do with.  He said,  "what will I do with the rest?"  I said that I would take it, all of it, and that I would take care of each piece.

I continued to receive boxes until his death. 
Josie McClure, my muse.
With each; Mary, Lane, and I would sit in our living room and I would explain the history behind each memento.  One of these mementos was a picture of Josie McClure.  She became my muse for answering the second question.



Dr. Michael L. Nelson,
my academic parent.
The second question was posed by my academic "parent," Michael L. Nelson.  One day in 2007; he stopped me in the Engineering and Computational Sciences Building on the Old Dominion University campus, and posed the question "Are you interested in solving a little programming problem?"  I said "yes" not having any idea about the question, the possible difficulties involved, the level of commitment that would be necessary, or the incredible highs and lows that
would torment by soul.  But I did know that I liked the way he thought, his outlook on life, and his willingness to explore new ideas.

The combination of answering two simple questions, resulted in a long journey.  Filled with incredible highs brought on by discovering things that no one else in the world knew or understood, and incredible lows brought on by no one else in the world knowing or understanding what I was doing.  My long and tortuous trail can be found here.

While on this journey, I have accreted a few things that I hope will serve me well.

My own set of questions:


1.  What is the problem??  Sometimes just formulating the question is enough to see the solution, or puts the topic into perspective and makes it non-interesting.  Formulating the problem statement can be an iterative process where constant refining reveals the essence of the problem.

2.  Why is it important??  The world is full of questions.  Some are important, others are less so.  Everyone has the same number of hours per day, so you have to choose which questions are important in order to maximize your return on the time you spend.

3.  What have others done to try and solve the problem??  If the problem is good and worthy, then take a page from Newton and see what others have done about the problem.  It may be that they have solved the problem and you just hadn't been able to spend the time trying to find an existing solution.  If they haven't solved the problem, then you might be able to say (as Newton is want to say) "If I have seen further it is by standing on the shoulders of giants."

4.  What will I do to solve the problem??  If no one has solved the problem, then how will you attack it??  How will your approach be different or better than everything done  by everyone else??

5.  What did I do to prove I solved the problem??  How to show that your approach really solved the problem??

6.  What is the conclusion??  After you have labored long and hard on a problem, what do you do with the knowledge you have created??

Be an active reader.

Read everything closely to ensure that I understand what the author was (and was not) saying.  Making notes in the margins on what has been written.  Noting the good, the bad, and the ugly.  If it is important enough, track down the author and speak to them about the ideas and thoughts they had written.  Imagine if you will, receiving a call from a total stranger about something that you've published a few years before.  It means that someone has read your stuff, has questions about it, and that it was important enough to talk directly to you.  How would you feel if that happened to you??  I've made those calls and you can almost feel the excitement radiating through the phone.

Understand all the data you collect.

In keeping with Issac Asimov's view on data: "The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'"  When we conduct experiments, we collect data of some sort.  Be that memento temporal coverage, public digital longevity, digital usage patterns, data of all sorts and types.  Then we analyze the data, and try to glean a deeper understanding.  Watch for the outliers, the data that "looks funny" have additional things to say.

Everyone has stories to tell.  

Our stories are the threads of the fabric of our lives.  Revel in stories from other people.  Those stories they choose to share, are an intimate part of what makes them who they are.  Treat their stories with care and reverence, and they will treat yours the same way.

Don't be afraid to go where others have not.  

 During our apprenticeship, all our training and work point us to new and uncharted territories.  To wit:
"...
Two roads diverged in a wood, and I,
I took the one less traveled by,
And that has made all the difference."
(The Road Not Taken, by Robert Frost)

Remember through it all;


The highs are incredible, the lows will crush your soul, others have survived, and that you are not alone.

And in the end,

"So...
be your name Buxbaum or Bixby or Bray
or Mordecai Ali Van Allen O'Shea,
you're off to Great Places!
Today is your day!
Your mountain is waiting.
So...get on your way!"
(Oh, the Places You'll Go!, by Dr. Seuss)





With great fondness and affection,

Chuck Cartledge
The III. A rapscallion.  A husband.  A father.  A USN CAPT.  A PhD.  A simple man.








Thanks to Sawood Alam, Mat Kelly, and Hany SalahEldeen for their comments and review of "my 6 questions."  They were appreciated and incorporated.

Tuesday, September 16, 2014

2014-09-16: A long and tortuous trail to a PhD

(or how I learned to embrace the new)


I am reaching the end of this part of my professional, academic, and personal life.  It is time to reflect and consider how I got here.

The trail ahead.
When I started, I thought that I knew the path, the direction, and the work that it would take.  I was wrong.  The path was rugged, steep, and covered with roots and stones that lay in wait to trip the unwary.  The direction was not straight forward.  At times I wasn't sure how to set my compass, and which way to steer.  In the end, there was more work than I thought in the beginning.  But the end is nigh.  The path has been long.  At times the was direction confusing.  The work seemed never ending.  This is a story of how I got to the end, using a little help from "a friend" at the end of this post.

Bringing the initially disparate disciplines of graph theory, digital preservation, and emergent behavior together to solve a particular class of problem, is/was non-trivial.  Sometimes you have to believe in a solution before you can see it.

Graph theory is: the study of graphs, the mathematical structures used to model pairwise relations between objects.  In my world, I focused on the application of graph theory as it applied to the creation of graphs that had the small-world properties of a high clustering coefficient and a low average path length.

Digital preservation is: a series of managed activities necessary to ensure continued access to digital materials for as long as they are needed.  In my world, I focused on preserving the "essence" of a web object (WO), not the entire object.  WOs can include links to resources and capabilities that are protected and not visible on the "surface web."  While this web "dark matter" could contain unknown wealth and information,  I was interested in the essence of the WO and preserving that for the long term.

Emergent behavior is: unanticipated behavior shown by a system.  In my world, I took Craig Reynolds' axiom of imbuing objects with a small set of rules, turning them loose, and seeing what happens.  My rules guided the WOs through their explorations of the Unsupervised Small-World (USW) graph, how they made decisions about which other WOs to connect to, and when and where to make preservation copies.

Graph theory, digital preservation, and emergent behavior are brought together in the USW process; the heart of my dissertation.

At the end of a very long climb, there is:

A video of the USW process in action video:



My PhD Defense PowerPoint presentation on SlideShare.






 A video of my dissertation defense can be found here.

 My dissertation in two different sized files.
A small (19 MB) version of my dissertation.

A much larger (619 MB) version of my dissertation can be found here.

A simple chronology from the Start in 2007 through the PhD in 2014 (with a little help from my friend).

2007: I started down this trail
The "story" of my dissertation. (My friend.)

2007 - 2013: The Unsupervised Small-World (USW) simulator (on GitHub) directly supported almost all phases of my work.  It went through many iterations from its first inception until its final form.  What started as a simple was to create simple graphs in python, through a couple of other scripting languages, stabilized as an message driven 5K line long C++ program.  The program served as a way to generate USW graph to test different theories and ideas.  The simulator generated data, while offline R scripts did the heavy lift analysis.  One my favorite graphs was a by-product of the simulator (and it didn't have anything to do with USW).

2008: Emergent behavior: a poster entitled "Self-Arranging Preservation Networks."

2009: Emergent behavior and graph theory: a short paper entitled "Unsupervised Creation of Small World Networks for the Preservation of Digital Objects."

2009: Graph theory: Doctoral consortium

2010: Digital preservation: a long paper entitled: "Analysis of Graphs for Digital Preservation Suitability."

2011: Graph theory: arXiv on entitled: "Connectivity Damage to a Graph by the Removal of an Edge or Vertex."

2011: Graph theory: a WS-DL blog article: "Grasshopper, prepare yourself. It is time to speak of graphs and digital libraries and other things."

2012: Digital preservation: a long paper entitled: "When Should I Make Preservation Copies of Myself?"

2013: Digital preservation: a WS-DL blog article: "Preserve Me! (... if you can, using Unsupervised Small-World graphs.)"

2013: The USW robot, my own Marvin, (on GitHub) grew from the lessons learned from the simulator.  Marvin worked with Sawood Alam's HTTP Mailbox application to actually create USW graphs based on data in the USW instrumented Web Pages.

2013 - 2014: Emergent behavior: working with Sawood Alam and his HTTP Mailbox application.  The Mailbox was the communication mechanism used by USW Web Objects.

2014: Digital preservation: an updated long paper entitled: "When Should I Make Preservation Copies of Myself?"

2014: My PhD defense (link to set of slides).

2014: LaTeX: a WS-DL blog article: LaTeX References, and how to control them.

2014: LaTeX: a WS-DL blog article: An ode to the "Margin Police," or how I learned to love LaTeX margins.

2014: Dissertation submitted and accepted by the Office of the Registrar.

In many movies, there is one line that stands out.  One line that resonates.  One line that sums up many things.  The one that comes to my mind was uttered by Sean Connery as William Forrester in the movie "Finding Forrester" when he pointed to the faded photograph on the wall and said: "I'm that one."

The trail, and the road was long and trying, with many places where things could have gone awry. But in the end, like Kwai Chang Caine and his brazier, the way out of the temple was shown and the last trial was completed.

Chuck

Published works (ready for copying and pasting):
  • Sawood Alam, Charles L. Cartledge, and Michael L. Nelson. HTTP Mailbox - Asynchronous RESTful Communication. Technical report, arXiv:1305.1992, Old Dominion University, Computer Science Department, Norfolk, VA, 2013.
  • Sawood Alam, Charles L. Cartledge, and Michael L. Nelson. Support for Various HTTP Methods on the Web. Technical report, arXiv:1405.2330, Old Dominion University, Computer Science Department, Norfolk, VA, 2014.
  • Charles Cartledge. Preserve Me! (... if you can, using Unsupervised Small-World graphs.). http://ws-dl.blogspot.com/2013/10/2013-10-23-preserveme-if-you-can-using.html/, 2013.
  • Charles L. Cartledge and Michael L. Nelson. Self-Arranging Preservation Networks. In Proc. of the 8th ACM/IEEE-CS Joint Conf. on Digital Libraries, pages 445 – 445, 2008.
  • Charles L. Cartledge and Michael L. Nelson. Unsupervised Creation of Small World Networks for the Preservation of Digital Objects. In Proc. of the 9th ACM/IEEE-CS Joint Conf. on Digital Libraries, pages 349 – 352, 2009.
  • Charles L. Cartledge and Michael L. Nelson. Analysis of Graphs for Digital Preservation Suitability. In Proc. of the 21st ACM conference on Hypertext and hypermedia, pages 109 – 118. ACM, 2010.
  • Charles L. Cartledge and Michael L. Nelson. Connectivity Damage to a Graph by the Removal of an Edge or Vertex. Technical report, arXiv:1103.3075, Old Dominion University, Computer Science Department, Norfolk, VA, 2011.
  • Charles L. Cartledge and Michael L. Nelson. When Should I Make Preservation Copies of Myself? Tech. Report arXiv:1202.4185, 2012.
  • Charles L. Cartledge and Michael L. Nelson. When Should I Make Preservation Copies of Myself? In Proc. of the 14th ACM/IEEE-CS Joint Conf. on Digital Libraries, page TBD, 2014.

Published works (ready for BibTex):