Distance Reading Wiki

From Brock University's Digital Humanities Compendium

(Difference between revisions)
Jump to: navigation, search
Revision as of 23:16, 2 October 2011 (edit)
Sm00ah (Talk | contribs)
(Distance Reading Responses)
← Previous diff
Revision as of 23:17, 2 October 2011 (edit) (undo)
Sm00ah (Talk | contribs)
(Distance Reading Responses)
Next diff →
Line 139: Line 139:
[http://dc03mh.wordpress.com Dave] [http://dc03mh.wordpress.com Dave]
-I thought I would quickly jump in and comment on a few of the points being made here. First off, both [http://ryansriot.blogspot.com Ryan] and [http://gilligansisland.wordpress.com Melanie], I entirely agree with both of you, that much of this debate is a matter of methodology. As I've mentioned in my blog and in class, I've got a number of concerns about which methods scholars use when adapting to and integrating digital resources and results into traditional Humanist research. That said however, I'm not entirely convinced that the amount of information becoming available necessarily translates into there being better scholarship or that academics will be able to find the appropriate resources. That said, [http://dc03mh.wordpress.com Dave's]point, that much of finding resources has always come down to the scholar to find or not. In this manner, he's right, whether the material is digitized or remains only available in print, libraries primarily help improve access the resources. However, there is something to be said about the relationship between ease of access and the resources which get used..... This is of course a matter we can take up in class this week as it certainly deserves more attention. +I thought I would quickly jump in and comment on a few of the points being made here. First off, both [http://ryansriot.blogspot.com Ryan] and [http://gilligansisland.wordpress.com Melanie], I entirely agree with both of you, that much of this debate is a matter of methodology. As I've mentioned in my blog and in class, I've got a number of concerns about which methods scholars use when adapting to and integrating digital resources and results into traditional Humanist research. That said however, I'm not entirely convinced that the amount of information becoming available necessarily translates into there being better scholarship or that academics will be able to find the appropriate resources. That said, [http://dc03mh.wordpress.com Dave's] point, that much of finding resources has always come down to the scholar to find or not. In this manner, he's right, whether the material is digitized or remains only available in print, libraries primarily help improve access the resources. However, there is something to be said about the relationship between ease of access and the resources which get used..... This is of course a matter we can take up in class this week as it certainly deserves more attention.
'''Thanks to everyone who has posted and discussed the material so far.''' '''[http://textsincontext.blogspot.com Sean]''' '''Thanks to everyone who has posted and discussed the material so far.''' '''[http://textsincontext.blogspot.com Sean]'''
-I will address the fourth question. I do not believe that Digital Humanities will lead to an overshadowing of meaning by "statistical measures." Yes, based on Cohen's article "Analyzing Literature by Words and Numbers," the quantitative analysis of texts is gaining attention throughout the academic world, but scholars still have to define the meaning of the data. In terms of Cohen's article, she gives the example of the decline in the use of "God" in the 19th century, which the meaning behind this change correlates with the rise of skepticism at the time. There is the quantitative data, but we still need to interpret the meaning of the results. The interpretations made by humanists will be supported by statistics, rather than being overshadowed by them. For example, if a historian made the argument that liberty was a major theme in literature from colonial America, then they could collect all the times "liberty" or "freedom" were mentioned in books from the time period to support their thesis. However, this data would still need to be interpreted and given extra meaning. Why was liberty such a great deal to Americans in the 18th century? What events led to this increase in interest with freedom? Meaning still plays a prominent role in the humanities, even with the advent of statistical data. As well, certain aspects of academic study in the humanities cannot be done with quantitative data. Tracing the history of a certain region or person cannot use statistics, but rather relies on a historian's interpretation of the primary sources. Arcaheology is a field in which the analysis and interpretation of the findings play the primary role. Statistics may be used to record the number of a certain artefact, such as arrow heads, but an arcaheologist must then interpret what the arrows were used for, who used them and why were they found in that region. Those are questions that statistics cannot immediately answer. Thus, I believe that statistical data will work alongside meaning in the humanities, rather than overshadow it. Grant+I will address the fourth question. I do not believe that Digital Humanities will lead to an overshadowing of meaning by "statistical measures." Yes, based on Cohen's article "Analyzing Literature by Words and Numbers," the quantitative analysis of texts is gaining attention throughout the academic world, but scholars still have to define the meaning of the data. In terms of Cohen's article, she gives the example of the decline in the use of "God" in the 19th century, which the meaning behind this change correlates with the rise of skepticism at the time. There is the quantitative data, but we still need to interpret the meaning of the results. The interpretations made by humanists will be supported by statistics, rather than being overshadowed by them. For example, if a historian made the argument that liberty was a major theme in literature from colonial America, then they could collect all the times "liberty" or "freedom" were mentioned in books from the time period to support their thesis. However, this data would still need to be interpreted and given extra meaning. Why was liberty such a great deal to Americans in the 18th century? What events led to this increase in interest with freedom? Meaning still plays a prominent role in the humanities, even with the advent of statistical data. As well, certain aspects of academic study in the humanities cannot be done with quantitative data. Tracing the history of a certain region or person cannot use statistics, but rather relies on a historian's interpretation of the primary sources. Arcaheology is a field in which the analysis and interpretation of the findings play the primary role. Statistics may be used to record the number of a certain artefact, such as arrow heads, but an arcaheologist must then interpret what the arrows were used for, who used them and why were they found in that region. Those are questions that statistics cannot immediately answer. Thus, I believe that statistical data will work alongside meaning in the humanities, rather than overshadow it. [http://schramarama.wordpress.com Grant]
== Post-Presentation Distance Reading Wiki Notes == == Post-Presentation Distance Reading Wiki Notes ==

Revision as of 23:17, 2 October 2011

Image:distancebanner.jpg

Contents

Distance Reading Articles

Cohen "Analyzing Literature by Words and Numbers"

Cohen "In 500 Billion Words..."

Crane "What Do You Do with a Million Books"

Moretti Graphs, Maps, Trees **In case some of you can't find the text, these articles are similar.

Google Ngram Viewer and Other Supplementary Material

Google Ngram Viewer

Google Datasets

Quantitative Analysis of Culture Using Millions of Digitized Books

"Culturomics" Defined

Cohen "Analyzing Literature by Words and Numbers" Review and Notes

Review

In her concise and balanced article, it is clear that Patricia Cohen is enamoured with the possibilities of digital tools and resources to enhance and supplement humanities research, yet nonetheless has evident concerns about the rising influence of strictly statistical and quantitative information replacing the interpretive and critical aspects of the Humanities. Highlighting this concern, Patricia Cohen notes several case studies in which the presence or absence of a word, term, or phrase misconstrues any strictly statistical examination of the text. However, despite any personal misgivings on her part, she nonetheless gives credence and credit to the benefits which such research would give way to. Her respect for the work of Dan Cohen and Fred Gibbs arises in her presentation of their research as a means of complementing and reconsidering traditional historical and literary scholarship. Despite her interest however, she is nonetheless quick to reiterate the concerns of several scholars who argue that digital resources, as they develop, stand to not only “reduce literature and history to a series of numbers” but also “to shape the kinds of questions” scholars will ask. In addition to her methodological commentary and concerns which digital tools raise in terms of humanities research Patricia Cohen also notes the rising role which corporations like Google play in framing resources, literature, and scholarship online. On this matter Cohen highlights the traditional, yet valuable, critiques most often raised against monopolistic information ownership including control, cost, and access.

Notes on Article

Statistical analysis, with texts being “electronically scoured for key words and phrases that might offer fresh insight into the minds of Victorians”

New research as a result of a new digital tools and databases

Technology is “transforming the study of literature, philosophy, and other humanistic fields that haven’t necessarily embraced large-scale quantitative analysis”

Notes that Dan Cohen and Fred Gibbs are using this sort of digital technology to search and graph the frequency of specific words and word groupings (ex- God, love, work, science, and industrial) from 1789 until 1914

Cohen and Gibbs are using the results of this data to reaffirm or question traditional historical and literary assumptions and interpretation regarding the Victorian age

Author notes that while there is interest in this sort of research there is also some hesitation by those concerned that such resources would “reduce literature and history to a series of numbers, squeezing out important subjects that cannot be easily quantified.”

Scholar Matthew Bevis raised the comment that these sorts of programs are “not us tool[s]” but that they actually have the potential to “shap[e] the kind of questions someone in literature might even ask.”

The author is quick to note that Cohen and Gibb’s project, “Reframeing the Victorians” is just one of a dozen to be recognized by Google, itself a major corporation involved in the digitization of texts.

With the mention of Google and its rising role and control of digital texts and online library Patricia Cohen observes such plans have many concerned with the potential for one organization to have control over this many resources. Control, as Cohen notes, often translates into larger costs as well as questions of availability and access for individuals.

Patricia Cohen highlights that many scholars believe that digital tools will let them do a more comprehensive studies of certain ideas, eras, and cultures since they are now able to search a much larger number of texts and books.

However, to this enthusiasm, Dan Cohen is quick to acknowledge that much of the early statistical analysis of texts “is anything but clear” in its results. Software programmed to find specific terms inherently avoid related words and phrases, and have the potential to miss the larger context in which words are used. (ex- Syntax)

The point being that, on a certain level, fewer or more references to a specific term do not necessarily equate to a specific focus or disuse.

Cohen concludes by asking whether statistical analysis of texts will “overshadow meaning and interpretation” or whether it will serve to “highlight the importance and value of close reading...[and]…heightened engagement with words, paragraphs and lines of verse.”

Cohen "In 500 Billion Words..." Review and Notes

Review

Notes on Article

Article discussing Google’s “mammoth database culled from nearly 5.2 million digitized books” which is now available and which stands to open a vast range of “possibilities for research and education in the humanities”

Focus of Google’s resource, created from books published between 1500 and 2008, is to serve as a database which allows for anyone to input searches of up to five words and review “a graph that charts the phrase’s use over time”

Cohen admits that this sort of mental exercise is interestingly addictive, as well as providing for insightful material for consideration.

For her own part, Patricia Cohen runs a number of searches, comparing references of women to men, Mickey Mouse, Marilyn Monroe, and discovering that “Tiananmen Square” has more coverage in English than Chinese texts.

This resource, designed by Mr. Lieberman Aiden and Jean-Baptiste Michel, seeks to utilize digital resource in order to “transform our understanding of language, culture and the flow of ideas.”

Lieberman Aiden, a mathematician and genomics scholar set as his goal to demonstrate “what becomes possible when you apply very high-turbo data analysis to questions in the humanities.” He has called his method “culturomics", which is defined by Wikipedia as “a form of computational lexicology that studies human behaviour and cultural trends through the quantitative analysis of digitized texts.”

The goal of this resource is to demonstrate the research possibilities which digital tools bring to humanities scholarship and which have traditionally been avoided by literary and history academics.

Some of the insights which this project gives way to are an analysis of the duration of cultural trends such as the rate of invention, duration of fame, as well as the growth and development of the English lexicon.

Cohen quotes Steven Pinker, a Harvard linguist who argues that the output of such database searches “makes results more convincing and more complete”

Pinker believes that despite resistance in the humanities to such research, the results of these sorts of databases will make their usage more “mainstream”

To which Patricia Cohen turns to Louis Menand, a Harvard English professor who while acknowledging the benefits of such resources also notes that some such claims can be “exaggerated”.

Menard also comments that among those advocating Culturomics there was not a single Humanitist involved in the project.

Similarly, Columbia American studies professor, Alan Brinkley is noted as dubiously wondering what exactly was trying to be done with these statistics.

In response to such traditional Humanist hesitation to their work, both Michel and Lieberman Aiden emphasize that their project “simply provided information” in the hopes that scholars would be “willing to examine the data.”

Both readily acknowledge that “cultural references tend to appear in print much less frequently than everyday words.”

Cohen notes that the main impetus of Leiberman Aiden and Michel’s research into Culturomics arose from the 18 months they spent scrutinizing Anglo-Saxon texts for irregular verbs. When they read about Google’s plans to digitize texts and create and online library they sought to collaborate with Peter Norvig at the company.

Cohen also notes that despite ongoing legal actions against Google for copyright infringement, the Culturomics project is exempt as neither “the books themselves, nor even sections of them” can be read.

Crane "What Do You Do with a Million Books" Review and Notes

Moretti "Graphs, Maps, Trees" Review and Notes

Distance Reading Discussion Questions

Discussion Question 1:

To start out this week's debate we thought we'd begin with a couple of general questions. With the advent of the digital environment a much larger number of texts and resources have been made available. Practically speaking, is there a point at which there is simply too much available to be able to sort through - in other words, is there ever a point in which there is so much information, or the scale of it is so large, that it in fact impedes effective research?

Discussion Question 2:

While the digital environment has given way to a range of new and useful means of textual analysis how does a statistical critique of literature and resources fit within traditional approaches? Do the possibilities for digital texts surpass or fall short of traditional approaches to printed materials?

Discussion Question 3:

Crane discusses the possibilities for digital technologies to complete document analysis, information extraction, multilingual translations, and textual evaluation. Do you think it is practical that programs are created to evaluate texts? What is the point at which such projects cease assisting scholars in locating information and begin determining for academics what information is relevant? Aren't evaluation, critique, and translation essential aspects of the Humanist undertaking? Can evaluation and critique be reduced to quantification or are they more than the end sum of an equation?

Discussion Question 4:

Cohen, in "Analyzing Literature by Words and Numbers" concludes by quoting a scholar who argues that "large-scale, quantitative research is likely to highlight 'the importance and value of close reading; the detailed, imaginative, heightened engagement with words, paragraphs and lines of verse." However, generally in our readings, little mention has yet to be made in terms of how to engage with and critically utilize resources. In a similar sense, Cohen's article "In 500 Billion Words, New Window on Culture" quotes Steven Pinker, a Harvard linguist who argues that the output of database searches of digitized texts “makes results more convincing and more complete”. With a rising focus in academic institutions on quantitative analysis and results, do you think that DH will reassert Humanist interpretation or allow for "statistical measures" to "overshadow" meaning?

Distance Reading Responses

     ** Let it Begin **

Well I'll kick things off this week. I'll start by addressing the first question.

The massive catalogues of books will radically change acceptable practices within history. I have no doubt of this. Where a theory could be proposed off the reading of 30 books before it might require 300 now to hold the same weight. While this seems ludicrous, I don't think it will amount to a whole lot more work for the historian in the long run. This of course can only be the case if we refine our methods of research and adapt. New methods to pull out relevant points and skim over the useless bits must be developed hand in hand with the massive catalogues of digitized books. And of course there are other aspects that must be taken into account. Digital archivists will be very important in this endeavour.

So I'll say no. I don't think that the scale of information will necessarily impede research. It COULD, but I dont think it has to.

RiotousRyan

I'll go next... I think the first question is very interesting. I can understand how millions and millions of books at one's disposal could be seen as problematic for research, with just the sheer volume of material one would have to go through. I think what is important to keep in mind though is that as these digital libraries grow, evolve and change, so too will the tools used to search in them. I think with effective search tools, a library of millions of digital books will be kept fairly manageable, as only certain books would be brought forward from a search, or only sections of an article to be examined rather than the entire thing. Some of these techniques are addressed in the Crane article. I know this opens the questions of how searches are structured, how items are catalogued or tagged, etc. but those are things that require their own examination altogether, and aren't issues I think I could address adequately in this space. The fact of the matter is, the same amount of material is still out there regardless if it is digitized or not; digitizing it just makes it more easily available to more people. I think Cohen's "Analyzing Literature..." article is correct when it states that digital research will offer a new kind of comprehensiveness that previous research was lacking. As for the second question, I think the statistical analysis of literature can be used as a tool within traditional approaches, such as the example in the Cohen article of the work done on Victorian mindsets towards progress and science. No one is forcing anyone to use the digital data that is being made available; it is a choice of the researcher if they would like to use digital resources on their research or not. In the Victorian example, digital tools provided some information that would've been very difficult to gather otherwise. The same article also states, however, that these tools aren't just tools but are changing the kinds of questions being asked by humanists. I don't necessarily feel that is a bad thing, though - it is opening another field of inquiry. Just because a new field opens does not mean that another, more traditional one has to close. Melanie

I like the 3rd question because Crane is discussing the changing nature of the texts that are being digitized and wealth of new possibilities and problems that converting every book into a digital format presents. The issue with translations alone is something that he identifies when considering the books related to Classics for example. This is a discipline that houses many different languages and the ability of the translator programs to provide an accurate, contextually sound translation is one issue, along with the need to make all of these databases available in both their original format and in english. Crane also points out that initial text capture was done in a series of PDF's and other large single object transfers. Today, the ability (and the need) to identify ALL of the words (or objects) in the texts is crucial to allow for deep research. If the file is a PDF without the ability to explore within it (except for actually reading it) then the point of transferring this information into the digital realm is defeating the purpose. I also appreciated how Crane and others identified the troubling fact that Google, which is after all a commercial enterprise, is positioning itself to be the single point of entry for all human knowledge. This is somewhat troubling to me and not because of copyright or other rights issues; instead it seems to me that the endeavour should be undertaken by an agency that has a transparency that I don't see with Google. I heartily suspect that there will be fees coming very soon..... The ability for academics to use these resources to assess and develop theories can only be assisted by the wealth of information that is rapidly coming on line. I have no love of quantitative history and this is, I suspect, always going to present an issue with humanists whenever the massive amount of computerized information is provided to their inquiries. It is always up to us to frame our own questions and direct the path of our own research; the digitized texts and the massive online libraries, search engines, pattern matching bots and other programs designed to help us filter out the mass of raw data will only help us become better historians. Dave

I thought I would quickly jump in and comment on a few of the points being made here. First off, both Ryan and Melanie, I entirely agree with both of you, that much of this debate is a matter of methodology. As I've mentioned in my blog and in class, I've got a number of concerns about which methods scholars use when adapting to and integrating digital resources and results into traditional Humanist research. That said however, I'm not entirely convinced that the amount of information becoming available necessarily translates into there being better scholarship or that academics will be able to find the appropriate resources. That said, Dave's point, that much of finding resources has always come down to the scholar to find or not. In this manner, he's right, whether the material is digitized or remains only available in print, libraries primarily help improve access the resources. However, there is something to be said about the relationship between ease of access and the resources which get used..... This is of course a matter we can take up in class this week as it certainly deserves more attention.

Thanks to everyone who has posted and discussed the material so far. Sean

I will address the fourth question. I do not believe that Digital Humanities will lead to an overshadowing of meaning by "statistical measures." Yes, based on Cohen's article "Analyzing Literature by Words and Numbers," the quantitative analysis of texts is gaining attention throughout the academic world, but scholars still have to define the meaning of the data. In terms of Cohen's article, she gives the example of the decline in the use of "God" in the 19th century, which the meaning behind this change correlates with the rise of skepticism at the time. There is the quantitative data, but we still need to interpret the meaning of the results. The interpretations made by humanists will be supported by statistics, rather than being overshadowed by them. For example, if a historian made the argument that liberty was a major theme in literature from colonial America, then they could collect all the times "liberty" or "freedom" were mentioned in books from the time period to support their thesis. However, this data would still need to be interpreted and given extra meaning. Why was liberty such a great deal to Americans in the 18th century? What events led to this increase in interest with freedom? Meaning still plays a prominent role in the humanities, even with the advent of statistical data. As well, certain aspects of academic study in the humanities cannot be done with quantitative data. Tracing the history of a certain region or person cannot use statistics, but rather relies on a historian's interpretation of the primary sources. Arcaheology is a field in which the analysis and interpretation of the findings play the primary role. Statistics may be used to record the number of a certain artefact, such as arrow heads, but an arcaheologist must then interpret what the arrows were used for, who used them and why were they found in that region. Those are questions that statistics cannot immediately answer. Thus, I believe that statistical data will work alongside meaning in the humanities, rather than overshadow it. Grant

Post-Presentation Distance Reading Wiki Notes

Personal tools
Bookmark and Share