Reflection piece

The Digital Histories module has introduced me to many aspects of the digital, both in relation on how to undertake more specific research online and in terms of understanding the drastic amount of factors/elements that go into the constructing and maintaining of these digital history projects, which in turn become the online resources where new and innovative research takes place.  The course itself has changed the way in which I will and have gathered information and interacted online, in particular, through the use of blogging, more specifically word-press and twitter.  Word-press has allowed for the distribution of various types of information, regardless of the source, as well as proving to be a useful tool in terms of gathering information online and responding to other peoples posts. The facilities provided allow for responses to be made, as well as the ability to ‘re-blog’ and ‘like’ posts that I find to be both interesting and relevant to all aspects of digital humanities (in particular digital histories).  I have particularly found that through the using of twitter, that it is perhaps a better tool in relation to the gathering of real time information, as well as being used as a gateway to other online resources and blogs.  This could be due to the 140 character limit that comes along with twitter, which is in many ways why it is so appealing and useful in terms of interacting quickly and efficiently and therefore works in twitter’s advantage.  Perhaps most importantly, the course has introduced me to different types of online projects, which in turn provide new and innovative ways in portraying and gathering data, a particular favourite of mine being Google n-gram viewer.  In addition to introducing me to the concepts of big data, crow-sourcing, and open access in relation to on going debates within the digital humanities sphere.  A particular useful element  and key theme gathered from the module, was going through the thought process of actually going about creating an online digital history project, along with the various thought processes that need to be made in order to implement such a project, whilst at the same time avoiding the standard ‘book’ type analysis of history and just allowing users access to sources and with that resulting in a more personal analysis, without having to accept an author’s conclusion being forced upon users.

How does the digital change the nature of historical research?

  1. How does the digital change the nature of historical research? (Assignment) 

The nature of historical research has drastically changed and evolved (especially within the last decade) and will continue to do so as time progresses.  This is undoubtedly due to the implementation of the ‘digital’ having profound effects on changing the whole nature of historical research in a variety of different ways.  The introduction of the digital has encountered ‘road blocks’ and perhaps has some limitations, however in analysis of its overall impact, it can be determined that the digital change has not only made historical research easier and more accessible, but has an will continue to innovate and create variety in the way people undertake and interpret historical research itself.

In attempting to interpret the changes that the digital has had is complicated, not least due to the constant changes that happen within the digital world itself, this also presents the difficulty in defining the term ‘digital history’ for the same reason.  However, in categorising the different and most identifiable reoccurring forms in which the digital can take, allows an interpretation of the definition in the broadest sense of the word.  Therefore, the term digital history can be defined as the representation of all information available online, regardless of the format or the way in which it is presented.  In its most recognisable and basic forms, this would be online archives, libraries and journals, as well as the more recent formats such as social networks and online communities.  In examining the advantages and disadvantages of these formats allows us to greater realise the extent to which the digital change has changed the nature of historical research, both from a practical standpoint, as well as a philosophical standpoint.

Firstly and most obviously, the digital change has meant that historical data, including all different types of sources, have been accumulated and made accessible through the internet, meaning that historical data had become more accessible to the masses as a result.  This, combined with the introduction of the concept of ‘big data’, meant that if left untouched, would leave users facing ‘historical archives of almost unimaginable abundance’ [1]. As a result, the digital offered and continues to offer new ‘sophisticated methods for finding trends and anonymities’ [2] and therefore solutions in dealing with big data and then portraying that data in a variety of different ways.  An example of such a large archive can be seen in Google books, which currently contains more than 50 million books [3].  The majority of such books (previously classed as analogue) have been scanned and digitised using OCR technology.  This is just one of many examples that highlights the extensive ‘possibilities for online research and teaching that would have been unimaginable just a few years ago’ [4], and will continuously develop as time progresses, both in scale and nature.

However, there have been some people who have been wary in embracing the digital change over in relation to historical research and research in general.  Abby Smith suggests that ‘we should be cautious about letting the radiance of the bright future blind us to the limitations of this new technology’ [5].  This is in reference to digital data being only a sample of the original data, (in relation to analogue) which is not necessarily the case anymore, however is still a viable limitation in terms of the impact that the digital changeover has had on historical research.  This limitation is arguable due to the fact that digital data need not be any different to that of analogue data and can be just as ‘continuous’; it just depends on the amount of digital data gathered, although scale has become less of an issue.  Although not perhaps suggested, but still in relation to this limitation is the on going issue of ‘open access’ versus ‘paid access’, where the latter does in its very nature enforce limitations on the digital in relation to accessing data online.  However, this is really no different to the problems faced prior to the digital age, where ‘in traditional scholarship, scarcity was the problem, travel to archives was expensive, access to elite libraries was gated and resources were difficult to find’ [6].  As a result, the digital change over has in reality, highlighted an already existent problem of restricted data, be that analogue or digital, but the fact that so many resources are now available, those that are restricted in access become more and more recognisable.  This in turn has led to the ‘emergence of new rights regimes (open access, open content and open source), and the explosion of new information are manifestations of these changing costs’ [7].

It seems that the possibilities in terms of historical research through use of the digital are vast.  The new structure of the web has allowed for a more advanced and alternative ways in which students can approach historical research and has had and will continue to do so.  When first introduced, historians had recognised that ‘computers’ healthiest influence in history thus far has been the deepening and broadening of professional conversation’ [8], however, it seems it has now reached a new peak, not just in its influencing more debate and conversation, but it has (perhaps not deliberately), created a new type of historical source, through social media.  The way that people in general have begun to use social media will and is in the process of changing the nature of how historians’ will undertake historical research in the future.  This is evident in the British Library’s attempt to in a sense, archive the web, ‘in a bid to preserve the nations digital memory’ [9].  The British Library has expressed it’s urgency in providing the archive, and that the fact that they had not yet done so ‘ever since people began switching from paper and ink to computers and mobile phones, material that would fascinate future historians has been disappearing into a digital black hole’ [10], which in turn suggests just how important that the digital changeover has become to historical research, and in the of archiving web pages and social feeds, allowing for useful historical resources in the future and In doing so creating a new type of source.  This in turn also highlights some of the problems brought by the digital change and attempts to cater for the future can be difficult due to the ‘inherently unstable nature of the web, and that information constantly mutates, and search engines’ algorithms can change results and prices in an instant’ [11].  There are therefore arguable difficulties in embracing the digital change, and with that in many ways attempting to predict the future and new innovations by keeping up to date and current, not to forget the costs of such projects.

There are also issues raised in terms of the digitisation of text.  In relation to the digitising of text and the issues that come with OCR technology (however much improved it may now be), does not solve the problem of being wary of the ‘density of data collected’ [12] and the ‘digitisation’ from an analogue state.  This is in relation to what has been termed as ‘faithful representation’ of the original text or source, this could be inclusive of hand written notes, or card catalogues[13] which were often disposed of and therefore not an accurate representation of the original source.  However, it is also true that technological advances have ‘improved greatly in the ability to make faithful digital surrogates whilst in reducing the costs of doing so’ [14] and due to the very nature of the digital will continue to improve, until there will be and are identical digital copies to that of their analogue counterparts.

What should be considered important are the new perspectives of looking at and analysis data, which could not have been done without the digital change.  Many of the most successful and useful projects have been those that use traditional methods of comparing and contrasting data within archives, and using the technology available to them to extract accurate and useful data that would have been near impossible to retrieve from the actual physical archive itself.  Examples can be seen in projects that use text mining, in which ‘high-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning’ [15].  Large scale examples can be seen from Google’s N-gram viewer, which extracts data from it’s Google books archive, to portray the frequency or rate of appearances of words or phrases, termed as ‘n-grams’ occurring in over 6% of books ever published, therefore ‘capable of precisely and rapidly quantifying cultural trends based on massive quantities of data’ [16].  Based on the data projected, interpretations can be made resulting in many different conclusions that would have otherwise been unobtainable.  This is also evident in websites such as the ‘Old Baily Online’, in which ‘197,745 criminal trials’ are archived and that text mining allows users to ‘compare patterns of persecution over time and further examine changes in court behaviour and procedure’ [17], which would have been an impossibility without the project and therefore a direct consequence of the digital change, highlighting it’s importance in changing the nature as well as the outcome of historical research.

The digital has arguably made history more adaptable to the public, as more people can undertake research that they themselves are interested, in a much easier and more interactive environment, which can also allow users to have a greater understanding on sources or data,  as well as the immense amount of data available, theoretically all in one place.  This along with digital searching, which is overlooked as advantageous to many who have all become well accustomed, as ‘digital searching most dramatically transforms access to collections. This ‘finer grained’ access will revolutionize the way historians do research’[18], and allows for access to the relevant data immediately and has changed the very nature in which historical research is approached and completed.  Disadvantages if any exist can be the sheer amount of data available can perhaps in some circumstances overwhelm users, and in many cases, through using the search functions to highlight key words or phrases, can make the context of the history more difficult to understand, however this is not always the case, and due to the amount of material available, there is also a varying level of secondary source material, resulting in many secondary source material being better or worse than others.

As a result, the digital change has had and continues to have an immeasurable impact on the nature of historical research, not just for the tangible advantages that are brought as a result, but for the fact that it has and will continue to change the very culture of historical research itself, both in the way that historical research is approached and in the way that data is being perceived, as well as resulting in an increased popularity in digital humanities, [19] whilst simultaneously making research easier and more interesting, despite the road blocks that have been met and will occur along the way. There is no doubt that the digital change has impacted the nature historical research for the better, whilst maintaining the key fundamental principles that are embedded in the way in which historical research is undertaken and will therefore be interesting to continue to observe the new innovations which will continue to impact the nature of research as time progresses.

(Word count: 2200)

References

[1] – Dan Cohen blog, http://www.dancohen.org/2012/02/08/digital-journalism-and-digital-humanities/ (digital journalism and digital humanities).

[2] – Dan Cohen blog, http://www.dancohen.org/2012/02/08/digital-journalism-and-digital-humanities/ – accessed 01/04/13

[3] – Google books (en.wikipedia.org/wiki/Google_Books#2013) accessed 01/04/13

[4] – Daniel J. Cohen and Roy Rosenzweig, Digital History, A guide to gathering, preserving and presenting the past on the web, http://chnm.gmu.edu/digitalhistory/digitizing/1.php accessed 01/04/13

[5] – Daniel J. Cohen and Roy Roesenzweig, Why Digitize the Past? Costs and Benefits, http://chnm.gmu.edu/digitalhistory/digitizing/1.php, accessed 01/04/13

[6] – William J. Turkel, Going Digital, http://williamjturkel.net/2011/03/15/going-digital/

 

[7] William J. Turkel, Going Digital, http://williamjturkel.net/2011/03/15/going-digital/

 

[8] – Edward L. Ayers, The Pasts and Futures of Digital History, http://www.vcdh.virginia.edu/PastsFutures.html

 

[9] – Time Tech blog, British Library Sets Out to Archive the Web
http://techland.time.com/2013/04/04/british-library-sets-out-to-archive-the-web/#ixzz2QMRfU1fb

 

 

[10] – Time Tech blog, British Library Sets Out to Archive the Web
http://techland.time.com/2013/04/04/british-library-sets-out-to-archive-the-web/#ixzz2QMRfU1fb

 

 

[11] – Time Tech blog, British Library Sets Out to Archive the Web
http://techland.time.com/2013/04/04/british-library-sets-out-to-archive-the-web/#ixzz2QMRfU1fb Tenner quote.

 

 

[12] – Daniel J. Cohen and Roy Roesenzweig, Why Digitize the Past? Costs and Benefits, http://chnm.gmu.edu/digitalhistory/digitizing/1.php, accessed 01/04/13

[13] – Daniel J. Cohen and Roy Roesenzweig, Why Digitize the Past? Costs and Benefits, http://chnm.gmu.edu/digitalhistory/digitizing/1.php, accessed 01/04/13

[14] – Daniel J. Cohen and Roy Roesenzweig, Why Digitize the Past? Costs and Benefits, http://chnm.gmu.edu/digitalhistory/digitizing/1.php, accessed 01/04/13

[15] – Text Mining,  http://en.wikipedia.org/wiki/Text_mining

 

[16] – Culturomics, http://www.culturomics.org/Resources/A-users-guide-to-culturomics

[17] – https://historyspot.org.uk/podcasts/digital-history/text-mining-old-bailey-proceedings

[18] – Daniel J. Cohen and Roy Roesenzweig, Why Digitize the Past? Costs and Benefits, http://chnm.gmu.edu/digitalhistory/digitizing/1.php, accessed 01/04/13

[19] – Geoffrey Rockwell, Inclusion in the Digital Humanities

http://www.philosophi.ca/pmwiki.php/Main/InclusionInTheDigitalHumanities

Bibliography

–       Dan Cohen blog, http://www.dancohen.org/2012/02/08/digital-journalism-and-digital-humanities/ (digital journalism and digital humanities).

–       Daniel J. Cohen and Roy Rosenzweig, Digital History, A guide to gathering, preserving and presenting the past on the web, http://chnm.gmu.edu/digitalhistory/digitizing/1.php

–       William J. Turkel, Going Digital, http://williamjturkel.net/2011/03/15/going-digital/

–       Edward L. Ayers, The Pasts and Futures of Digital History, http://www.vcdh.virginia.edu/PastsFutures.html

–       Time Tech blog, British Library Sets Out to Archive the Web
http://techland.time.com/2013/04/04/british-library-sets-out-to-archive-the-web/#ixzz2QMRfU1fb

–       Culturomics, http://www.culturomics.org/Resources/A-users-guide-to-culturomics

–       https://historyspot.org.uk/podcasts/digital-history/text-mining-old-bailey-proceedings

– Geoffrey Rockwell, Inclusion in the Digital Humanities

http://www.philosophi.ca/pmwiki.php/Main/InclusionInTheDigitalHumanities

Time Magazine Cover (2006)

authentic persuasion

1101061225_400

As the Time Magazine Editor stated in this edition:

“The new Web is a very different thing. It’s a tool for bringing together the small contributions of millions of people and making them matter. Silicon Valley consultants call it Web 2.0, as if it were a new version of some old software. But it’s really a revolution.
(…)

Who are these people? Seriously, who actually sits down after a long day at work and says, I’m not going to watch Lost tonight. I’m going to turn on my computer and make a movie starring my pet iguana? I’m going to mash up 50 Cent’s vocals with Queen’s instrumentals? I’m going to blog about my state of mind or the state of the nation or the steak-frites at the new bistro down the street? Who has that time and that energy and that passion?

The answer is, you do. And for…

View original post 42 more words

Google N-Gram Viewer Critique

Google books N-Gram Viewer Critique

Jon Orwant and co-creator Will Brockman along with a team of engineers launched Google N-Gram viewer an extension of Google books in December 2010.  It is essentially a graphing tool, which displays the  yearly count of selected n-grams (words or phrases)[1] and extracts data from more than eight million out of the 20 million books in the Google books archive, which is an estimated 6% of all books ever published spanning from 1500-2008. Containing approximately 500 billion words [2], in British English, American English, French, German, Russian, Spanish, Hebrew and Chinese the database is extensive to say the least.

The main methodologies involved in gathering the various n-grams are through Google books and therefore uses OCR (optical character recognition) as the main technology in which Google has gathered the n-grams (data).  As a result this could represent the first of possible ‘limitations’ to n-gram viewer, due to possible OCR errors that can and do occur, such as that of the word ‘internet’ appearing pre-1950s, which Google addresses in it’s frequently asked questions section by saying that they do a good job at filtering out books with low OCR quality scores, but some errors do slip through [3].   This was also highlighted in letters being misinterpreted by the OCR technology, especially interesting are examples’ of the ‘long medial s’[4] which in fairness looks very much identical to the ‘common day f’, a common example of this is shown on the heading of the American Bill of Rights as shown below.

Screen Shot 2013-03-10 at 11.36.07

This consequently raises concerns over the accuracy over some of the graphs produced by n-gram viewer, even though Google are combating the issue.However an example given by Google showing the evident improvements in their OCR technology is shown in the graph below through the comparisons of the word ‘beft’ (misinterpreted by the OCR technology originally ‘best’), showing a significant improvement in 2012.

2009-2012

N-gram viewer is essentially a ‘text mining tool’, which uses the data supplied by Google Books, allowing users’ to identify trends over time.  Through using n-gram viewer, you begin to realise the immense scale of the Google books archive and that n-gram viewer provides the user with such an extensive amount of both primary and secondary source material represented clearly and instantly.  It is therefore an extremely useful and addictive visual tool, which in its most basic form allows users to track the use of words throughout time.  The metadata supplied through Google books, now allows users to search all the instances that the searched n-gram was used by time periods, allowing for access to the original source material, through a hyperlink linking to books within the Google books archive, as shown below.  Google also clearly explains the various speech tags that are available to search, and therefore greatly eases the usability of the site (also shown below).

hyperlinktobooks

Screen Shot 2013-03-13 at 22.26.27

As a result of the data being extracted straight from Google books, it provides for reliable, useful and relevant sources, however due to copyright laws, many of the books are at the present time only previews instead of the whole book, however this is a legislative problem, rather than a direct fault with the n-gram viewer.  Also it seems that in the ‘best’ sets of data, in terms of correlating n-grams are after the 1800s, however that is still not to say that searches from 1500 don’t produce interesting data, that may or may not be both useful and interesting to potential users.

N-gram viewer is extremely user friendly and accessible, along with the sheer size of the database that Google has at it’s disposable allows for endless possibilities in terms of how people can and actually do use n-gram viewer, which is reflected in the sheer volume of searches, at 50 times a minute.  As a result, the popularity of n-gram viewer speaks volumes, along with the individuality and innovation of a project such as this from Google represents the future of digital history online and most importantly keeping people engaged in history in a new and creative way, from a resource that would otherwise not have been available.  Therefore I highly recommend Google n-gram viewer, not only as a useful tool in providing historical perspective throughout time, but also as an addictive invention that all can use to explore its endless potential.

References

[1] – http://en.wikipedia.org/wiki/Google_Ngram_Viewer

[2] – http://libweb.lib.buffalo.edu/pdp/index.asp?ID=497

[3] – http://searchengineland.com/when-ocr-goes-bad-googles-ngram-viewer-the-f-word-59181

[4] – http://en.wikipedia.org/wiki/File:Bill_of_Rights_Pg1of1_AC.jpg

[5] -http://www.culturomics.org/Resources/A-users-guide-to-culturomics

[6] -http://googleresearch.blogspot.co.uk/2012/10/ngram-viewer-20.html

Google Ngram Viewer

First look at Google Books Ngram viewer, really interesting feature from Google.

Launched in December 2010, Google N-gram viewer was created by Jon Orwant, (Engineering Manager) and co-creator Will Brockman, for the purpose of tracking the usage of phrases across time and would therefore be of interest to professional linguists and historians.  However, it also became very popular with casual users and since then has been used about 50 times every minute to explore how phrases have been used in books spanning the centuries.

–       Google N-gram viewer is a graphical tool, which charts the usage of words and phrases, or ‘N-Grams’ (word-phrases), based on their yearly count, within 5.2 million books, spanning from 1500 to 2008, containing approximately 500 billion words.  It also recognises a variety of languages, such as American English, British English, French, German, Spanish, Russian, or Chinese.  As a result, over 45 million graphs have been created; I would say the best way to describe Google N-gram would be a history of the written word.

–       As of 2012, Ngram viewer has been updated to version 2.0, which extracts data from more than eight million out of the 20 million books in the Google books archive.  Approximately 6% of all books ever published. The recently upgraded Ngram viewer 2.0, includes improvements made by the engineers at Google in terms of OCR deficiencies and in hammering out inconsistencies between library and publisher metadata.

Advanced usage of Google Ngram

 

–       (Part of speech tagging).  Part of speech tags, e.g. words in context can be searched for, e.g. same words which could have different meanings, e.g. certain phrases etc.  Also how words have developed or changed examples verbs such as telephone_VERB, to phone_VERB.

–       Set of mathematical operators allowing you to add, subtract, multiply, and divide the counts of Ngrams.

Limitations

–       Due to limitations on the size of the Ngram database, only matches found in over 40 books are indexed in the database; otherwise the database could not have stored all possible combinations.

–       Typically, search-terms cannot end with punctuation, although a separate full stop, or period, can be searched. Also, an ending question mark (as in “Why?”) will cause a 2nd search for the question mark separately.

–       Once relevant books are found, often the whole book is not available for reading, either due to copyright laws or otherwise, however, this is more a problem with copyright laws in relation to Google books.

– Only 6% of books ever published?  Not sure if this can be seen as a limitation, as in reality this is an immense collection.