Building an Orientalist Corpus- ENGL256D

Some archives are housed in major historical events, some in photographs and others in art. But when historical events, photographs and art are housed in archives, they become a treasure trove. Archives preserve a unique gem: CULTURE

However, is it enough to trace history without understanding the story?

Is it enough to have access to orientalist’s books, documents, paintings and so much more without understanding the context of their writings? Their intentions? The messages they implied?

Visiting the archives wasn’t one of our random quests. We wanted to learn more about Fantasy in the scope of Orientalism. Through our findings, we noticed that it wasn’t enough to analyze one specific work to construct a general picture. The attitude we wanted to describe was related to major traits of Orientalism. To be able to depict writers’ behaviors through different periods and in different genres, we had to build a corpus.

Building an orientalist corpus is essential to recognize major patterns. It’s like if you’re trying to understand why a nation is enduring from a certain disease: you take a big enough random sample, apply statistical tools and then conclude a hypothesis. Somehow, this example is an analogy to our work: build a large orientalist corpus, apply analytical tools and then deduce behaviors.

A screenshot of few Orientalist Books in the corpus

How did we prepare the corpus? What are the steps?

Building an Orientalist Corpus is like creating the main body of a Western perception of the East. It requires compiling a lot of data. Hence, the first decision to make when constructing a corpus isWhat data should be stored in the corpus?

In the scope of our study, we decided to store orientalist writings. The second challenge we face was agreeing Where should the data be stored?We have to choose a safe and accessible location. Under the assistance of our professor, we started organizing different files in a Google Drive folder. We created an excel sheet for all necessary information. We broke down all relevant aspects of the writings into 8 categories:

Author’s name (linked to Wikipedia page)
Author’s nationality
Author category
Text category
Title of work (Linked to text)
Title of file in drive
Date of publication
Other notes

After creating this sheet, we created a folder named corpus in which we would post the texts. Up till this step, all of the decisions were taken by our professor. Our team was mostly responsible of filling the directory. All the writings were found in Gutenberg database. We converted all the texts into plain text, this format is more practical for textual analysis. When uploading the texts in drive, we used a unified syntax for the title of files as such: DateAuthorSurname_Writing’sTitle.txt (e.g: 1871Harvey_TurkishHaremsandCircassian Homes.txt).

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Share this: