UCHRI07-iaea
From Education
Break Out Session Group 3: Information Access, Extraction, and Analysis
What kinds of data are widely used? • Texts • Video materials • Audio Materials Extraction and analysis for these expanded data sets are more complicated than for quantitative data. Storage technique is irrelevant as long as the integrity of the content is maintained. Essential is ability to mine, restore, input, export, use, and preserve the data.
Other issues and barriers: Working with undergraduates, these data and the tools created to work with them need to be easy to access.
What data sets are easy enough to incorporate into undergraduate instruction?
Census Data
Fernando Hernandez teaches a course on the social foundations of education. He uses census data to do comparisons (like Richard’s) to compare regions. Students were asked to find two zip codes, pull up the census data, and compare education levels, income, etc. so the students can see how to use freely available data.
One student group did an interesting spin-off. They were asked to do a “walking or driving tour” through the different neighborhoods of L.A. They did a visual and data analysis of each area; too photos of schools, houses, playgrounds, in each area.
Students in the course were not asked to do statistical analyses, but they could, and could learn to use SPSS or another statistical analysis package to analyze data sets for closest correlates.
Census data are public, available, and useable after the class is over. The data can be connected to other tools, like mapping data. Pair it with a walking tour of L.A. Then when they go on the walking tour of L.A., they see it differently. Enrichment data set: Could combine with qualitative data from Richard Marciano’s data collection (red-lining).
With Tera Grid, could compare similar data sets across the US, almost in real time as you do the presentation. Assign different cities to student teams and ask them to look for similar connections. Teach them how to use the tools so they can continue to educate themselves. Collective learning; peer-teaching; How can HPC and grid technologies be mobilized to advance collective learning? Can they “collect themselves” to teach one another. Can TeraGrid create spaces like my space that are learning environments; co-educational.
Analytical tools, file sharing, visual file-sharing (all local for the course).
Develop a network of universities that take on challenges at the undergraduate level where students around the country are gathering data, extracting data, analyzing data, visualizing data, and sharing data to solve shared problems or challenges. Bring together different disciplines at the same time; 3-4 different disciplines could collaborate to solve the problem. Single class with different perspectives. Class is thematic – one theme or challenge. Multi-disciplinary team of teachers, students from several fields. Show how they combine perspectives and share data to explore a theme.
Administrative challenge/ hurdle: How can we change curriculum “on the fly” without having to change departmental guidelines or fight class accreditation challenges. Imagine a center-core humanities course (set of literature from a specific area). Digitize the text; text mining; set in historical perspective; collect other materials to create the setting of the story (news clippings, photos, music, art); Todd Pressman has hypermedia Berlin project. Digital maps, historically time-sliced, with hot links to the important cultural sites of the day – museums, galleries, opera houses. Click on a hot link and it opens a site that tells you about that site. Material gets deeper and deeper (investigate performers at a particular opera house). Imagine larger and larger data bases that could be drawn upon for more and more sophisticated courses: data mining, text mining, music mining, art, travel habits, travel vehicles, maps, circulation of goods and commodities. All of this is built upon available digitized databases. How widely known are these digitized databases. At this point, undergraduate faculty may not know about them. There’s a range of collections available that are well known to specialists in that field, but not widely known by others. How much is used in teaching, even by those who know about them? Hypermedia Berlin is largely known and used by UCLA.
Tools; Existing and NeededItalic text Directory of databases in each field; i.e. a database of databases. Metadata collection tool for each database. On-line tutorials for using each database
That should be a dynamic space that’s constantly updated. Students will update (faculty researchers won’t). Students will do their own activities and knowledge around a database.
World Lecture Hall – people share lectures. In the humanities, increasingly audio performances and lectures are being recorded and posted.
Zoom tools to explore HASS data along continua of: Time (some would contest that if couched as “progress.”) linearity is used in music, history, and some other cultural fields. Space: International, National, States, regions, even if contested, the national identification is retained. Size of entity – local, global continuum (that is transitional) Transitional spaces: e.g. – tension between old and new in China – some old, some new, some in transitional state; same is seen in Latin America. Can data be organized along trend lines – can changes be identified (clusters of changes) Can we identify areas of change through a cluster of variables? Perseus.org web site. Minnesota (?) is doing a study of sustainability as a cultural and economic phenomenon. What kind of content might be interesting? How would TeraGrid enable a project like that?
What on-line resources and technologies would be useful?
What communities are building their own data bases? Time Capsules of communities – what would each community want to save. Can this be created as a wiki-like community contribution space. Lack of filtering is crucial. Want to save a “collection” in multiple formats and data types. What about cultural memory? Can we capture communities’ cultural memories using sharing technologies and tools and structures? Ask Students to create their own community cultural time capsules through interviews, photos. Student projects (in high school in L.A.) were asked to explore WW II by interviewing people in their own community who fought in WWII. Citizen data collections. On-line digital tools – translation tools that allow people to speak in their own languages and have instantaneous translations. What exists beyond the tools in a P.C. for analysis and extraction of data? Are there domain experts who can share those tools?
What are some of the grand challenges in humanities? There are perennial questions that reappear over many years. (at least back to the Greeks) that re-appear with different nuances and gather different responses. 1. What is the nature of the human? 2. Universalism vs relativism 3. What is real vs unreal? These are not challenges that need answers – but they might be used to organize the data collections. They might also help to address idea of different realities and methodologies for getting at realities. Empiricism vs. mysticism. Different realities and interpretation of reality; with different databases of ways to get at reality in that way. To address these, we need: 1. Text data mining 2. Visual material mining 3. Technology that allows you to both zoom in and zoom out of collections of data. 4. Zoom into different depths of data. Tools in the humanities may enable researchers to address specific questions but not the over-arching “grand general” questions. The grand perceptions or challenges could be used as metadata for organizing data sets. Different disciplines pose these questions and respond to them in very different ways. Pat Seed’s presentation raised some very interesting questions around the movement of people over time. Cartography gives us clues to that – new kind of evidence to answer those questions. Technology (in the abstract) allows you to address relationships between things that without the tools, you might have imagined would be there but you could never explore it. Technology allows you the depth that you can’t do without it. E.g. Mining of NY Times for bias toward American Indians.
Digital Data in Education: Can CI be used to to customize the methods and form of receipt of information? Faculty may be interested in providing teaching content in several forms, if there’s a benefit to them. Ease of use for entering data Availability of databases of content for courses. Tools to enable collaborative learning in the classroom and outside the classroom Tools to enable students to immerse themselves with other students in learning environments. Environment in which there is a range of databases relevant to that class. Tools to immerse students in alternate reality (take them outside the classroom) Evaluative piece is important. How do you know that students are receiving what you intended? Some aspects of the algorithm that need to incorporate evaluation. Are the students understanding what was meant, its implications, its meaning. Can the students begin to evaluate their own knowledge and skills? How do you answer the “how do you know” questions? Self-efficacy means that students can get there without the instructor? To do that, can you identify the steps necessary to reach that knowledge? Can that pathway be recreated that allows students to get there on their own? Technology can give them the power to teach themselves. Vygotsky says there is no person born with a table rasa – there’s no blank slate. From the beginning, you’re immersed in culture. Adults loan you their reality until you can build your own. That suggests that some kind of verification is part of the learning process, and CI must reflect that. What is reality? It is the resistance that we meet in life. Sometimes the ideas we come up with must be discarded because they meet too much resistance. Social construction of reality. Immersive issue: project that came from UCLA – recreating classical and early medieval urban spaces as virtual representations. Bernie Fisher at Virginia and Diane Favera at UCLA (architectural historian). Some is immersive and some is virtually immersive. It has a game-like visual quality. It allows students to “experience” in some abstract sense the spatial experience of those architectural environments. It has led to some discoveries – like the play of light in the Parthenon, which dictated the workday for the lawmakers in that era. Materials – start with existing case studies and materials in the humanities. Grab people by what they understand.
Sustaining needs: Collaborative teams across institutions Collaborative teams within institutions Interdisciplinary teams (technology with humanists) to address thematic issues HASTAC is all about building the technology-non-technology connections. User –friendly tools are first prerequisite. Friendly equivalent of a help desk that arises from the community that has been created (peer support community) Ongoing meetings so people feel that there is a community (small as well as large) Ask them – what are their needs Archive of all the meetings – to allow people to find out what they missed, with podcast of featured speaker and the comments and questions of the audience around the featured speaker. Continuing blog around the presentation that has built into it a Q&A. For these students, YouTube works. Professional validation of the use of these tools (sub-groups at professional meetings; workshops within their field meetings). Methodologies – fomenting fora at national meetings will be important. Discuss methodologies and technologies at professional meetings. Provide methodology courses to grads (and undergrads). How do you validate this in terms of retention and tenure processes? This is being addressed at several institutions. It will continue to be a struggle for another generation. NEH – Has started a digital humanities section, primarily at the digitization of materials stage, with some projects on archives. Kathy Davidson (HASTAC co-creator) wrote a piece in this week’s Chronicle on why we should take Wikipedia seriously as legitimate historical scholarship. Bruce Call is the head of NIH. Rugina Bayjy and David briefed him on HASTAC. He recognizes HASTAC’s importance. Craig Calhoun at the Social Sciences Research Council (privately run organization representing the social sciences) has a digital project. Both organizations should be sites that TG workshops are run through. Whatever is developed in the first phase needs field testing, revision, and verification. One idea worth resurrecting is that if NEH, NSF, and NIH created a coordinated pool of funds for CI, it would be powerful as both a statement of its import, and a vehicle for encouraging exchange of ideas.