Since many of you are a bit lost on how to write your term paper in R Markdown, I will talk briefly about it. The structure of this session is as follows: in section 1.1, I talk a bit about cross-referencing. Do you know what it means? In section 1.2, I talk about which elements to include in the header of your file (YAML). In section 1.3, I talk about how to give bibliographical references. Section 1.4 talks a bit about how to make more beautiful html or pdf outputs, and finally in section 1.5, you learn about deliverables. By deliverables, I mean what to include in the final output you send to me.
Definition: Cross-referencing is the process of linking various parts of a document, like figures, tables, and sections. For instance, in the above paragraphs, I have created clickable links to different sections of this document. Cross-referencing increases the readability of documents.
To do cross-referencing in R markdown you should do the following:
1- install the package bookdown. This package extends R Markdown to support cross-referencing.
#install.packages("bookdown")
2- Change your output format in your YAML. In a simple document, we have for instance output: html_document as our output format. We need to change this format to a bookdown-compatible format such as bookdown::html_document2 or other bookdown formats.
3- Cross-referencing syntax: The idea behind cross-referencing is very simple. You give a label to a section, figure, or table, and then you use that label to give reference to that thing.
You can manually assign a section a label with this syntax: {#label}. For instance {#sec:cross-section}. This comes after the title of a section. Then you can use the \@ref(label) command to reference that section, such as 1.1.1.
To label a figure, you use the fig.cap argument for the figure caption in the code chunk. Then, you use the ref() command for referencing. For example: As it is shown in figure 1.1, we see bla bla. The cool thing is that the numbering of the figures or tables are automatic.
plot(cars)
Figure 1.1: My Plot Caption
Check yourself how to cross-reference a table.
When you write a term paper, there are some elements you would like to have, many of them you do not see automatically in the output generated by your rmarkdown. For instance, you want to have a table of content, or you want your sections to be numbered automatically. Although these things are not automatically there, but you can add them very very easily with a line of code in your YAML.
output:
bookdown::html_document2:
toc: true
toc_depth: 3
toc_float: true
number_sections: true
This configuration in your YAML specifies the output settings for you R Markdown document using the bookdown::html_document2 format. By setting the parameters to true we ask our html output to have a table of contents (toc: true), sets it to display up to three levels of headers (toc_depth: 3), makes the table of contents floating so it remains visible as you scroll through the document (toc_float: true), and enables numbering for the document’s sections (number_sections: true).
So something you can do for your term paper: you can copy and paste the YAML of this rmarkdown in your term paper rmarkdown :-)
One of the questions I have received frequently for the term papers is how to do citations in R Markdown. In fact, it is quite easy. All you need is a bibtex file and a proper syntax for referencing. This format is a simple text file which you can open with any text editor, and it is the bibliography manager for LaTeX documents. However, you can also use it with other formats such as markdown files.
You all know google scholar. When you search for an article there, you can copy the article reference information in a bibtex format and include them in your .bib file. In the folder “2024_01_31_Session14”, you see a file called references.bib, here is what we keep our bibtex entries.
These bibentries have a citation key. We use those keys for referencing in the text. Also, as you know we have different citation styles such as MLA, APA, Chicago. To use different styles, you can download their csl file from here.
Examples:
Or you can simply do the referencing the way you would do it in a Word Document.
You might not be very happy with the html output you get. The good news is that there are other themes you can use either from the rmarkdown package itself or some extra packages.
R Markdown provides a wide range of built-in themes, such as “simplex,” “slate,” “flatly,” “darkly,” and many more. Each theme has its own unique visual characteristics, allowing you to choose a style that best suits your content or purpose. Basically, you can using R Markdown themes from base R, you don’t need for install any packages for that. But, when you want some extra. Here is some packages you can use for customize the themes: rmdformats, prettydoc, hrbrthemes.
Procrastination tip: Some of the themes you can give by default are: “default”, “bootstrap”, “cerulean”, “cosmo”, “darkly”, “flatly”, “journal”, “lumen”, “paper”, “readable”, “sandstone”, “simplex”, “spacelab”, “united”, “yeti”. You can specify the theme in your YAML theme: cosmo
If you are writing your term paper in R Markdown or Quarto (which I think most of you do), it is very important for me to be able to run your code and get to the same results that you did. Therefore, it might be easiest for you to create your term paper as a project in R, containing all data in there (don’t forget about good folder structure such as separate folders for data, plots, etc. inside the same project folder). Therefore, please include your RMarkdown, in addition to the html our pdf output you get after knitting the document. That being said, the deliverables are:
#inside {r} you can give a short name
Comment your codes. Most of the times, you think you will remember what you have done, but in reality, after a week or so, it is difficult to remember that :-) Also, having commented codes make your work more reproducible.
Set global parameters for your code chunks or set parameters individually on the code chunks. For instance specify whether the warnings shown in the final output, whether both your code and the outcome should be shown, or the outcome would suffice, etc.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The one below has a name (name comes directly after r) and also we set the parameter for the warning not to be shown. You can also set this parameter as a general one for the whole script. knitr::opts_chunk$set(echo=FALSE, message=FALSE, warning=FALSE)
library(tidyverse)
A lot of functions from tidyverse for handling your data
Some basics of webscraping, regular expressions, automatic annotation, data visualization, hierarchical structures such as html and xml
Read and write your data in/from R (read_csv(), write_tsv(), read_rds(), etc. )
Basic tidyverse functions: mutate(), select(), filter(), rename(), arrange()
My two absolute favorites: group-by(), summarise()
Very useful: ifelse sytax
string cleaning and processing functions: str_c(), paste0(), grepl(), str_detect(), str_extract(), str_remove(), str_replace()
Other useful ones: (1) merge and join functions, e.g., merge(), left_join(), inner_join(), (2) lag lag() and lead(), (3) slice()
for-loops (I may write a short tutorial on it and put it on ilias). Learn for-loops but use them only when absolutely necessary. In above 90% of my coding, I do not need to use them (there are alternative, easier solutions usually).
Manual annotation: I would’ve loved to talk a bit about my favorite manual annotation software for textual data (WebAnno and INCEpTION). They are very easy to use and there are good youtube videos on their functionalities.
How to query available existing corpora using sketch engine. It is super useful and not too difficult to learn.
My initial plan was to have the last 2-3 sessions to do purely linguistic analysis (similar to the text analysis we did today), such as cluster and correspondence analysis, lexical similarity analysis, concordancing, collocationa analysis, topic modeling. But it did not happen, as it took us a bit longer in the first couple of sessions. But at some point, I felt like it would be more useful to learn data processing in R on a more general level. If you learn useful functions, you can easily apply them to any type of data. And it is always enabling to be able to process your data, be it linguistic or non-linguistic. However, I really hope you will have more linguistically-focused courses with R in the future. In any case, there are a lot of resources you can use to learn different types of analysis. I introduced some in the second session of the course. Check the slides.
In my opinion, the most important step in R programming is to bring your data into R. The rest happens :-)
You heard about a lot of functions during this course. However, you won’t learn R (or any other programming language) until you get your hands dirty with coding. PLEASE DO NOT BE AFRAID OF IT! :-)
Google knows everything! GPT might be sometimes dumb, but it also knows a lot. If you run into errors, instead of giving up, google them or ask ChatGPT.
What I personally use ChatGPT for the most is debugging and grammar correction. My favorite short prompt Correct grammar without changing my tone.
In my opinion, DO NOT use it too much for writing (emails, articles, or whatever else). In its current state, it’s still quite dumb. The language is not natural and it may affect the reader’s perspective of you :-) Oh! if you use it for writing, use it smartly (dedicate sometime to your prompting)!