You are on page 1of 2

--title: "Exploratory Data Analysis of Swiftkey Dataset"

author: "Peter Caya"


date: "November 12, 2016"
output: html_document
--```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Questions to Consider from the Coursera Program:
>Tasks to accomplish
>Exploratory analysis - perform a thorough exploratory analysis of the data, und
erstanding the distribution of words and relationship between the >words in the
corpora.
>Understand frequencies of words and word pairs - build figures and tables to un
derstand variation in the frequencies of words and word pairs in the >data.
>Questions to consider
>Some words are more frequent than others - what are the distributions of word f
requencies?
>What are the frequencies of 2-grams and 3-grams in the dataset?
>How many unique words do you need in a frequency sorted dictionary to cover 50%
of all word instances in the language? 90%?
>How do you evaluate how many of the words come from foreign languages?
>Can you think of a way to increase the coverage -- identifying words that may n
ot be in the corpora or using a smaller number of words in the >dictionary to co
ver the same number of phrases?

```{r}

```

## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for autho
ring HTML, PDF, and MS Word documents. For more details on using R Markdown see
<http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes bo

th content as well as the output of any embedded R code chunks within the docume
nt. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent pr
inting of the R code that generated the plot.

You might also like