You are on page 1of 48
analysis, different types of analytics, and different point, sampleitens = c(sampleitent, sampleitenz, sampleiten3, sanpleitent) In the given syntay. the combined values are stored! in the sampleitems variable, To read the data values of a small dataset, the © () command is used. Explain the working of the scan() command in R. When using the © () command, you may find typing all the commas to separate v tedious. You can do the same job without using the commas through the somman: Apart from using the scan () command to enter text or numeric data into datasets, you ca" use this command with the clipboard to get the data from files. Unlike the © () command, the scan () command uses empty parentheses. Ater executin command and pressing the Enter key, you are prompted to enter the desires dat Compare the write.table() and write.csv) commands. Je () command is used to write the data stored ina vyctor to a tile. The da ate two or more values. The syntay ty combined. It you want 78 Qs. Ans The write. is saved using the delimiters, such as spaces or tabs. Apart trom the dsmiters, sult as or tab, the comma separated values can also be storedt in tiles. The 3 used to write the data in the CSV tile 22 Manipulating and Processing Data in R if you need information on: See page: ‘Selecting the Most Appropriate Data Structure 608 ‘Creating Data Subsets 605 Merging Datasets in R 609 Sorting Data ers Putting Your Data into Shape st ‘Managing Data in R Using Matrices ezz 624 Managing Data in R Using Data Frames —_— 4 sata structure i a spetaized format for organizing an storing data values. R provides various types ot data strates such a vectors al ata frames It also provides various types of faretees ts manipulate arnt process the data store! in different data structures. Some examples of sre fonction ire es) art orien (). The subset (function haps in extrating [particular datasets from 26 () function helps in jeining the data of dar, structures The or er) function helps in sorting data in ascending order or descending order ie aston. R proves fovhnigues for reshaping an formatting data for example changing the sha ‘of data froma wide format toa long format {thes chapter. you wil learn how to: manipulate and process data in R. First. vou learn how to slot a appropnate data structure to store your data. Next you lear how to create cata subnets from sditferent pes of data structures, how to combine or merge datasets in R. and how to sort the datasets in ascending or dewending order. Towards the end, you learn how to reshape data ie -Stheent formats. like in long or wide formats, Selecting the Most Appropriate Data Structure bn Mart hes joned as a now member the data analvics team of Argon Technology. He fas ‘atze vorbus types of data. Before analyzing the data, he needs fo place data mR ard forthe, he needs Sect spproprate data Sutures. ES set coprnprmte core serps tn computer terminology, a data structure refers to a particular way of arranging data in the memory cf 2 computer. Different applications use diferent types of data structures as per ther computing «and storage requirements. R provides the following types of data structures: 2 Vector—it is a structure that can contain one or more values ofa single type, such as character ‘umber, oF integer. It can be compared to a single column or row in a spreadsheet ota database ‘able. Vectors are use to represent only one-dimensional data, such as a set of digits. } 2 Matris—it is 2 two-dimensional vector. You can create a matrix by using the va==3% () function. ‘This function takes the parameters of numbers to be displayed in either the column of ro format. The ncol argument is used to specify the number of columns, and the rrow argument is sed to specify the number of rows. > Data frame—it isa list of named vectors of the same length It resembles a spreadsheet or table ‘m8 database with columns and a heading for each column. In other words, a data frame can be defined as a matrix in which data i stored in columns having names. Data frames are generally Used to represent time series 2 ListItss collection of objects that fall under a similar category. list isnot fixed in length and can contain other ists We have already discussed that depending upon their requirements, different applications may ust different types of data structures, The following are some guidelines to select an appropriate data structure: 2 Testore one-dimensional data, such asa set of digits, use vectors. 2 To store data that has more than one dimension, such as a set of digits and alphabets, use matrices, lists, or data frames ore data belonging to a single class, use matrices or higher dimensional arrays 2 Tostore data belonging to multiple clases, use 1.0 data frames To store a collection of objects that cannot be represented as an array or a data frame, you can use lists, as they can contain all kinds of objects, eluding other lists or data fames Creating Data Subsets ‘You already know that R deals with large amounts of data, not of whichis useful. Therefore, the first step inthe analysis process to sort out the data containing any relevant or useful information. The extracted datasets are then divided into small subsets of data for further processing. You can use the subset () function to create data subsets in R. The following operators help you to form subsets ‘of data in 2 $The dollar operator is used to select a single element of data, When this operator is used with ‘a data frame, the resultis always a vector. 2 [f-The double square brackets operator is used to return a single element. I! provides the flexibility of referring tothe elements by their position rather than by name. Generally, its used to represent the data stored in data frames and lists. [The single square brackets operator is used to extract multiple elements of dats, Let's now see how the subset () function and these operators are used to create subsets of data. Creating Subsets in Vectors To create the subsets of data stored in a vector, you can use the subset () function or the brackets Listing 221 shows the use of the subset () function and the [) brackets to create subsets of a vector Listing 22:1: Using the subset () Function and the () Brackets A sample vector 1 70G,5.63.2.4,2)¢————____—_| ‘Cente aver vith lita numbers subset, vet) sing subset function ¢—__["—_ rapambetsfnamen game ‘vived] fusing square bracketsq——————_[_‘eaesavatuetofmunber greater emer seeore Gracies than by wsngl Teese # Another vector F< Cone", “one”, "90", "three", "four", "wo")¢—[__Cresten vets witha it of txts # remove “one entries ————_______[_Crsterssateet ot teas ate emnningite ‘subsetCt, tl="one") srordon sing the soe ft ‘hthe" one") ¢——-$_____| caer satetiemanescnne Grapter 22 Figure 221 shows the output of executing the code given in Listing, 221 Figure 2.1: Creating Subsets in Vectors In Listing 22.1, two sample vectors, v and t, are created. The v vector is initialized with a lst of numbers, and the t vector is initialized with a series of texts. First the subset () function and { brackets are used to create a subset of numbers greater than 4. Then, the subset () function and brackets are used to create a subset of texts after removing the text, "one. ‘An important difference between the subset () function and square brackets is thal by using square ‘brackets, you can assign values to elements; however, you cannot do that by using the subset ( function, as shown in Listing 22.2: Listing 22.2: Assigning Values Using () Brackets Viva <7 SSG Ley < 74] Geese Figure 222 shows the output of Listing 222 fpwearee SS eee eee SPRAIN ay or ne mn lesen Figure 222; Assigning Values by Using the Square Brackets or to replace the numbers les task by using the subset? In Listing 222, the () brackets are used withthe assignment opera than 3 with a new number 7. However, if you try to do the same function and assignment operator, you get an eror 06 Manipulating and Processing B Inthis section, you lear to reat submets of data fs ; a a rames by using the subset () funetion and the See Using 2s nie fa dale an son sae ot es ae by using the subset () function and |) Hrochets Listing 22.3: Creating Simple Subsets of Data Frames #4 sample data frane Sata read.tabletheaderct, texte! Subject clase marke 7 2 22 uses 3a a2 » sence aaj « yf nee Garalarassubject < 3, ] <——_]_ Crm smbpn tnt te Figure 23 shows the output of executing the code shown in Listing 223: > $A sample dace frame > data <= read.cable(headerst, ce) | pect class marke 1 99 2 4 79 subject class marks a a a » subze: 2 88 Seubyect <3, 1 subject class marks a a a8 2 22 Figure 22.3: Creating Subsets in Data Frames Jn Listing 223, a sample ofa data frame, named data, is created with a table Then, the = functonand (, brackets reused to create subsets of data, where subject isles then + Listing 224 shows how to create subsets of particular rows and columns: Listing 224: Creating Subsets of Particular Rows and Columns 4 Subset of particular rows and colvans Subsec(cata, subject < 3, select = -subject) q—| “iiteessubaet whew te abjcttann twcaded andthe vent sien subset (data, subj subset (oata, t <3) select » eCclass marks), <3 solact © classimarks) data{datassubject <3, eC*elass® marks") subject < 3, select subJect <3, setect Gace ldacassubsect < 3, class marks 1 99 2 88 c(tetaass, Figure 22.4: Creating Subsets on Particular Rows an: Listing 22.4 shows how to display records without includi subset () function and then by using the { brackets, Listing 22.5 shows how to create subsets of diferent logical Listing 225: Creating Subsets of Different Logical Con # Logical ANO of two conditions ¢———_ Subset(data, subject <3 & class: dava(datassuvject< 3 & dataselas # Logical on of two conditions ¢————_| subsetdata, subject <3 | classee2) «| Sataldarassubject <3 | datasclassen ic > rests a ubvet where the subject alm eer than ain tere 7 = ctctasa,marks)) = clasa:marks) marks") of Data Frame subject column first by using the id Columns, ing the conditions: ons gee ‘Creates a subset where the sabe column ‘abject less than 3 othe lsat? by sy the [Teak {Creates abet where the subject column ‘subject leas than and the clas equals 2 by ng the subset) fanction a Creates ube where the aject alum ‘subject tens than 3 the castequals toby ig the sbset fanton subset where the sutjc column | he clas equals 02 by wg the Tact Create ess than 3a menputating and Processing Data OR Pagure 22.8 shows the output o put executing the code ven in Listing 225: austotapmbsce = 2 1 arabes 1 | igure 22.5: Creating Subsets of Login! AND and OR of Two Condon? play the records of data where yr and |) brackets are equals to, sv and {) brackets are used to dis o2 Next the subset () fanctos the subject is less thar 3oF the css In Listing 225, the subset () function the subject is Fess than 3 andthe class equals teed to display the records of data, where ether Merging Datasets in R ‘sometimes, similar datasets obtained fom dierent Frocesning, provides the foiowing functions 19° The marge ()function—It is used to merge the dat basis of columns and rows: 4 The ebina ()funetion=Ht is wed 1 a identical order of 1o¥s. a The rind ()funetion—It5 shows some ways of mers Navcounns Aad ROW _ ij L A L_ ain ‘bind sources need to be merged together for further bine different sets of date: a contained indifferent data frames on the the columns of datasets having an equal set and sed oad rows in datasets having an equal umber of columns. ing diferent datasets in : Figure 226 mee igure 226; Ways of Merging Diferont Datasets in chapter 22 Using the merge() Function The merge () function combines the common column between the to. The ne 2 x-Specifiesa data frame y—Specifies a data frame dato of two data frames on the basis of the existence of following argent; ye () function takes th fo by, by.x, by-y—Speciy the names of the columns common in both x and y all, all.x, all.y-Specify logical values forthe type of merge. The defaut value is 9 FALSE, Listing 226 shows how to create two data frames and combine them by using the merge function Listing 226: Creating and Merging Two Data Frames # wake first data frame GFL <- read-tableCheader=1, texte" 1D Name 1 Anne 2 John 3 serkeley fake another daca frane Gata <- read. table(header-T, texte! 3D English maths Ae aan 2 9 38 en » # merge the two data frames erge(afi, data, "r0")—<—| Cnborihgta elt an onthe Figure 227 shows the output of executing the code given in Listing 226: 5 GM Peete, ene 5, teketey > sergederhaaen 20) Figure 227: Merging Two Data Frames Using Same Column Name 610 ————__anpuistng and Processing Data in eee reseaet ae a smd with two columns, TD and ame. The second data the basis of which you have to combine dates ee 4 AM thts case, the cotunn is nae AF: read, tablecheader=t, sere seadentiD "wane aoeT—T> you can expliily specify the column name on min Listing 227; sing Different Column Names “studentro" instead of 1 Lanne 2 Sahn 3 derketey P # merge on dfaSstudentzD and df 2510 ‘Combine at of ‘slemn repetil ImergeCendf3, yadata, by.x«"studentio", by.y="10")¢— Figure 22.8 shows the output of executing the code given in Listing 227; 59° y Chis case, the Column Te Tamas “ATENEO THT] ead of 10 Pa reas cabetpeaerer texte" Ss ee \t 2 tertetey fe > > merge on F38student10 and 42510 > mergecendta, yaaaca, by. derc20", by.) Srodeneio, "wane English maths 3 2 bom 8 Se 5 Seerneiey 70 Figure 22.8: Merging Data by Specying Column Names Exlcly In Listing 227, a data frame, d£3, i ceated with two columns, Student and Name. The tmerge (function combines the data of 4£9 and Gf2 on the basis ofthe Student 20 coltan of Ses and the 20 column of 3¢2. You can also combine diferent data frames by using multiple column names a shown in Listing 228 Listing 228: Combining Data Using Maliple Columns of Data Frames ‘ake up more date : "PEadstablecheader=t, text on name class : PD hone 2 Jom 3 S martin 4 & Thos 3 ott Caper 22 es vine english maths aM me oe Aone 2hom 898 ? Ferterey 7087 trary 69 79 imerge(oetatls, Marks, €("10","Nane")) Figure 229 shows the output of executing the code given in Listing 228: Figure 22.9: Merging Data Using Two Column Names mn Listing 228, a data frame, Details, is created with three columns, 1D, Name, and Class. The second data frame, Marks, consists of four columns, ID, Name, English, and Maths. The ‘erge () function combines the data of Details and Marks on the basis of two columns, ID and Nane, ‘Table 22:1 shows four ways to combine data by using the merge () function: ‘Table 2-1: Different Ways of Combining Data by Using the merge () Function Naturaljoin | Tokeeponiy rows thatmatch | all=FALSE | merge(x= Details, y= Marks, withthe data frames by = "IDY all = FALSE) Fallouterjoin | To keepall rows from both | all=TRUE | merge(x = Details, y = Marks, data frames by = "1D", all = TRUE) Leftouter | “To include all the rows of allx=TRUE | merge(x = Details, y = Marks, join ‘your data frame x and only by =D" allx=TRUF) those from y that match “abe 2.1: Different Ways of Combining Data by Using the merge () Function ight outer | To nclade all the rows of eral eran reo ally=TRUE | mergts= Details y = Ma pain ‘your data frame y and on rate Delle those fom’ that atch wyenprauy Crosjoin | To display all records inal merge = Detals,y= Marks pecee cesta by = NULL) using the cbind() Function ‘the cbind () function is used to bind the column names of two datasets, I helps in restricting the rhumber of columns to be included in the new dataset. Listing 22.9 shows the use of the bin () function: sting 22: etat ls ing the cbind () Function Marks Bindcoetas1s(,€C2.2,3)] marksf.cC3,41) Figure 22.10 shows the output of executing the code given in Listing 229% *_| Figure 22.10: Using the cbindl) Function InFigure 229, fist the data ofthe Deas and aris data frames is displayed Then the bin) Fog ene combine the fs, second, and third columns of the Details daa frame and the third and fourth columns ofthe aks data frame, Using the rbind() Function “The spina) function is used to bind the rows of two datasets. It helps in restricting the number of cae eooded in the new data frame. Listing 22.10 shows the use of the rbiris() furetion Listing 2210: Using the rbins.() Funtion Datal c- read.tablecheader=T, text" 1 “hane 613 » bindcoatal, ata?). Figure 22:11 shows the output of executing the code given in Listing 22.10: Figure 2.11: Using the rbindl) Function In Listing 22:10, two data frames, Data and Data2, are created. Then, the used to combine the rows of Datal and Dat.a2. Sorting Data The head of the data analy team asked Marin To Sort id the data sored i are ata Sram such as vector, matrix, or data frame. Marth decides to perform this task by using tw sort) 2nd oc fanotons, OO R provides various fantons hat allow yout define the oer of your data ina cat st Te sort () and order () functions ae the mest commonly used functions to perform this sk earn about these functions in the following sections. —onta — Mant and Proceaing Dat = pt sing Date ‘The sort () function is used to Treinge vector inset 0 the values contained na vet Ling 2.1 shove an example ; fing and descending omer isting 2-11: Sorting and Reverse Sorting ofa Vector weede-c(23,45,10, Wsorcing of a vector mez + Reverse sorting Sore(vecl, decreasing-rnve)-<—| Ses Figute 22.12 shows the output of executing the code given in Listing 22.11: > s"Sorcing of a vecnce (21 10 20 23 34 48 67 99 39 > 4 Reverse sorting 5 soreivech, dectenning-T800) OP eS Se Figure 22.12: Diaplaying Sorting and Reverse Sorting on a Vector In Listing 22.11 a vector, named vec, is create of alist of numbers. The sort function is used to sort veel in ascending and descending orders. It should be noted that when we pass the Gecreasing-TRUE argument within the so: () funetion, the function displays the numbers in the descending onder. Ordering Data ‘The ordes() function is used to organize/arrange values or columns ina dataset Listing 22.12 Pe eee paras to crane a data frame ar use the ox de () Function in iferent way’ to Sort fone of more columns ofthe data Frame sting 2212: Creating and Sorting Data Frame ake a daa Fane aac daca. frame (id:5, ge weight=c(25,37,14,62,559) [cy = ‘sizese(‘sma11”, “large”, “aedium", “large”, "aediun”)) \eoatarrane ‘Sets Sampled ‘samplepatarravel order(sanol seeight), 14] Shear 4 sort by size, then by neight 4 500% bY Site Seger Sampleoatafranessize, samplepatarraneSneigh®), ] ‘t-sore by al¥ colums in the daca frame, fron left to right os cep 22 Sampeoeatrane do. catTCorar, a. tYetSanplenataFre)); a>) #1 this particular example, the order HiT? be Unchanged 1 the code given in Listing 2212: Figure 2213 shows the output of ex > Sampieoavarrane > SampieDacarrane| oraer (Sanplebatatramesueone) ] Sd weighe size 37 Tenge 35 medi @ “Taree 4 Sore by suze, een by weight SempleDacaFrane( order (Sanplebecafranesaize, 2 3 : 2 : 2 SempleDacatanesueighe), weigne size | 37 large | @ large 25 snail |> f sore by a12 columns in ene data frame, fron lefe to right |> Sauptepatarrane( do,call (order, as.list (SampleDecatrane)). ) sd weigne ize f2 DS onal 220 3 ieee 33 14 median It 4 @ large 55 $6 medion [2 [fe mets perctcutar exanpte, ete order witt be unchanged Figure 2213: Displaying Diferont Ways to Sort a Data Frame fi Listing 22.12 data frame, SanpleDatatrane, i created with some data. Then the ocdes() Nex, the order () funtion sorts SanpleDataFrane on te basis of multiple columns, then by weight Finally, the do.ca11 () and 2s..1ist () functions are used! to sort all the columrs of SanpleDataFrane, Reverse Sort You can reverse the order ofthe data contained in the column ofa data frame in two way: by #56 the argument, decreasing=TRUE, in the order () funetion or by using the minus() before th ‘column name. Listing 2.13 shows examples of sorting ata ofa data frame in the decreasing ote 616 _ Listing 22.13: Sorting of Data Frame in De ae easing Order — ean ster ee ‘eaten er enaerOn SENG ecarrenceni yt, scrnaringera) samplenatarrane[ order semper in order (Samecutarrantsie,sampeoratraet\d), Me} snc eon Figure 22.14 shows the output of executing the code given in Listing 213 [> Swbiepecarcamet orser Gunptetor laurel Feametveite, ean > sampiepetatranet order (SapieDacareanessize, -SumpieDacaFrasesa) saweigne aise 2G ee 5 5 55 nesta 228 mait see nseemomaenrenieen) Figure 2214: Sorting Data Frames in Descending Order In Listing 22:13, firs, the order () function sorts SampleDataFrane in the decreasing order of the weight column with the help of the argument, decreasing=TRUE. Then, the o=dex () function Sorts SampleDataFrame in the increasing order of the size column and then in the decreasing, order of the weight column. In this case, the decreasing order is specified by using the sminus(-) symbol It should be noted that if you wish to sort a column of factors in the decreasing order by using the ‘minas symbol, you need to use the xt frm () function Listing 22.14 shows an example of using the xcfem() function: Listing 221 sing the xt xm () Function Works on aco ‘sanpleoataFrane{_ order(-xtfra(sanpleoataFraneSsize), SanplebatarraneSuetoht), 1 semtencrrane orer-salestatrneate, Seeleonatraneieioh®): J Displayna wearing 617 put of executing the code given in Listing 22.14 arsine message: Ts Ops. factor (SampleDataFranetaize) 1 ‘-/ not meaningful tor tactors Figure 22.15: Using the xtfrmi) Function Im Listing 22.14, the order () function sorts SanpleDataF rame in the decreasing order of the column and then in the increasing order of the weight column. It should be nected that the . Splumn stores factors, so you need to use the xtfrm() function: otherwise, a waming wil displayed (Figure 2215) Putting Your Data into Shape R provides a variety of methods for reshaping data prior to its analysis. Some ofthese methods ar as follows: 2 Transposing data 2 Converting data between the wide and long formats Let's learn about these methods in the following sections. Transposing Data You can use the t () function to transpose a matrix or a data frame. into columns and columns into rows. Listing 22.15 shows the use of the t() function: t(SamplevataFrane) <—{ Transpo SunteDat ‘This function transposes 0 Converting Data to Wide or Long Format Wide data contains more S flowing funtion ofthe rats ae Fever rosa compare lang dat Kp 2 Usethe net 0 funtiontcemer ee aaa erie longa le data into the long format ‘on to convert long data into the wide format Use the 235) Function 10 Convert the fora data h araye ‘These act like the keys that identify your observations. Ho theobeored menaneZs that identity your observations, However, the measured varutles represent Let's now learn how to melt data to long format, Melting Data to Long Format You need to use the me1® () function for converting the data from the wide format into the long, format. The melt () function is contained in the reshape? package. Therefore, to execute the eit () function, you must install the reshape? package in R. Listing 22.16 shows the use of the net () function: Listing 22.16: Using the melt () Function data <- read. table(header=T, texte’ Subject sex control condi cond? 1 “ 9 113 10.6 2 Foe 10.7 wet 3 F984 wt a6 4 M105 Bs a ) Vibrary(reshape2)¢—{ _Loadsreshane? newdata. Jong<-melt(data)¢———————— "ets wide format data into on om newdata: long L_ Mate aie format da ino unstorr J 619 Chapter 22 Figure 22.17 shows the output af executing the code given in Listing 22.16 ~ SSS | Figure 22.17: Mating a Data Frame from Wide Format to Long Format In Listing 2216, a data fame, named dat, i created, Then, the command library is used to oad he reshapes package in R. After that the melt () function of the reshape? pickage is wed convert the wide format data into along format in the newdata. 1ong data frame. To download and instal the reshape? package, use install packages(Jas folows: instalipackages( reshape2’) The met () function, by default, considers all categorical variables as identifier variables. You can also specify the met () function with other options, as shown in Listing 22.17: Listing 22:17: Specifying Other Options with the me1t () Function enata. 1 eda avearseeCaubfece ses, ssurarvarece(coniro¥, Scand" ‘Variable.nane="condi tion’, yatuesnanestneasorenene” newdata. Tong Figure 22.18 shows the output of executing the code given in Listing 22.17: Figure 22.18: Specifying the met() Function with Other Options In Usting 2217, the mary ¢ destination column D and the ne pects 1D vari ID variables, soutce colurans, the d.vars. Inaddition, it 5 ld eee Sooo, Raa tyanyte wt eel ey cstimn a 4 you da ra (sblecnana, theme st) freer a rement ' vaiuevnane, i wil ame War cae casting Data to Wide Formas The doast () function ig vse of the deat () function 4 cgrenhaP® data again "pe aati tl rma isting 21 ws he esha long format othe ide forme le <= "dcast needa of eal teraaet amy bsece surements soisptaying newiat = tae wide with original aca 4M 4008 33.5 a3 D tucateying ene columns names sn vide formas > nanes (newdata vise) (nares (neve > sanes (newdace, wide) (names (nevaetavide)—="eond3"] > sewaees wade sanjece sex control f4rst second 9 “tes “toe 2 307 aa oe iin ine Figure 22.19: Converting Long Format Data into Wide Format In Listing 22.18, the dcast () function is used to cast the data from long format to wide format. T names () function is used to rename the column names, Seow Managing Data in R Using Matrices Before creating, ma essing m: wy should know how daa semen Te ating manipulating and accsng mati dats, you sould brow ho tabular format ang how that data is storetin-a mates fe a i. a a data stone a mat this end oi en ee matr reshape a vector ina mats, acces mati, and create mubvof theatre ‘nama Listing 22.19: Creating Matrices womntsinnzaas matrix. 3.9) e{ crawse man maerixGa. 3/5) metrics, 3 3) crate Figure 22.20 shows the output of executing the code given in Listing 2219 Figure 22.20: Creating Matrices In Listing 22.19, first of all, the matrix() function creates a matrix of 3 rows and5 columas, and all the fields in the matrix are set at 1. Next, a series of numbers from 1 to 15 is assigred to a variable ‘Then, the matrix () function creates a matrix of 3 rows and 5 columns with the numbers stored in the a variable. Further, calling of the matrix () function creates a matrix of 5 rows, 5 columns, and ‘ll the fields in the matrix are set at 5 (Figure 22.20). This type of matrix is also called ‘square matrix. Reshaping a Vector into a Matrix ‘To reshape a vector into a matrix, first of all, set the dimensions ofthe mattix ty using te 2381) function. Listing 22.20 shows a set of commands to create a vector Matrix acne ote Listing 22.20% Creating a Vector and Converting ito a acvector < 1:20 [Gems dincwatvector)<-cC4,5) #4 rows and 5 coTunns te eeciae Figure 2221 shows the output of executing the coe given in Listing 2220 Figure 22.21: Converting a Vector nto a Matrix In Listing 2.29, 4 vector, named mat vector is ereated and assigned with a sequence of numbers 1 to 20, Then the im) and c() functions are used to create matrix of rows and 5 columns and assign this structure tothe values of mat Lor. In this way, mat Vector is actually converted into a matrix, and its content is dis Sayed. in. the classi) function is used to detarmine the Gass of an objec, Le, whether the objects 8 mat, 3 ‘number. oF @ data frame. You can determine the class of an abject, matVector, by using the folowing ‘lass(matVector) _rint{mat Vector printing datatrame - aunenzgecetay | [cise [ap tare Arescode” “Population” "to_of_vehicies® (> Hoenanes mergebaca) fare Mae 88 er re nee tp rage | | > names mezoedera) [alan "AreaCode" "Bopulation* _"io_of_venicies* | 2) -popuventeze(,2)) > seunatacea, | ‘Area AreaCode popuvehicle(, 2) [eases 90 232343 [2 pucchese 7 t5aa00 3 Sensex se sears aie al prtrty orange 34 dsasee | maswocth 23 Ss5a08 Rockland 43 343905 anon _I [1 Figure 22.26: Merging Two Data Frames In Listing 22.25, the merge () function is used to merge the data of the area and popuvehicle data frames. The merged data is stored in a new data frame, mergeData. The dim() function displays the number of columns and rows of mergeData. The colnames () and rownanes () functions display the column names and rows names of mergeData, respectively. The names () function also displays the column names of mergeData. Finally, the cbind() function is used to restrict the number of columns of the area and popuVehicle data frames. t displays the first and second columns of area and the second column of popuVehicle. Performing Operations on Data Frames Now, les lean to perform some basic operations on data frames, like sorting and transposing sting 22.26 shows how to perform operations on data frames: 628 — anpating and Processing te Listing 2226: Performing Operations on by, “nen a Figure 22.27: Performing Operations on Data Frames In Listing 22.26, the order () function is used to sort the mexgebata data frame in descending order. The © () function is used to swap the columns of nexgeDat into rows and vice versa. After swapping rows and columns, mergeData no longer remains a data frame. You can convert it into a data frame by using the as .data.frame() function. Summary In this chapter, you learned about various types of data structures available in R. Next, you learned hhow to create data subsets from different types of data structures by using the su: (0 function and different operators, like (), Then, you learned how to combine cr merge datasets in R by using the merge(), ebind(), and rbind() functions. After that, you leamed how to sort datasets in ascending or descending order by using the sor () and order () functions, Finally, you learned to reshape the data in different formats. 23 Working with Functions and Packages in R Using Functions Instead of Scripts ou a7 Working with Packages _ st os The fmetion is used Wo wort «data fame. ono border) sel) ah deasto ane Thecorreet option iby subjective Questions ‘OL Listsome types of data structures aval ne in R, Ans R provides the following types of data structures 2. Name some operators used to form data subs ‘Ans. The following operators help you to form subsets of data in R a aes of a ing each “lum ot row in 8 ane dimensional ata Vector=It is a structure that can contain ane of mare val character, number, oF integer It can be compared 0 2 si spreadshcet or a databace Vector ae and repre om Such asa set of digits te a matrix by using thematc <0) Matrix—It sa two-dimensional vector. You cn creat iitton. This fancton takes the parameters of umbers to be displayed tm ether {Att or roe format The col argument ruse to spec the rubs of curs and thenrow argument used o specify the number of rows resembles a spreadsheet OF Data frame=It isa list of named vectors of the same length. a rare Jatatace sith columns and «heading fr each cokuma. In other words, 2 9 Fate oe defined n'a matrix in which data stored in cohumns having names: Pala frames are generally used to represent time series. List—It is a collection of objects that fall under a similar category: Tength and can contain other lists. A list is not fixed in ets in R. The dollar operator is used to select a single clement of data, When this oPeraios © ced with a data frame, the result is always a vector. [The double square brackets operator is used to rel oxibilty of referring tothe elements by their postion rather than eosed to represent the data stored in data frames and lists. [J-The single square brackets operator is used to extract mult urna single element. R provides the ‘by name. Generally, it iple elements of data. 03. _ List the functions provided by R to combine different sets of data. ‘Ans, R provides the following functions pro a a to combine different sets of data: ‘The merge.) function It is used to merge the data contained in different data frames con the basis of columns as well as rows. ‘The ebind() function—It is used to add the columns of datasets having an equal set and identical order of rows. The sind) fanction used wad rows in datasets having an equal number of 631 opt 2 Q4. Suppose you have two datasets, A and 1234p, datasets no dataset anne by using the merge fo. 8 shown by Ans. Dataset A and dataset B can be: > Ac-0(2,2,3,4) > Be-c(8,6,7,8) SOSERE Ee eveneune ‘subjective Questions an Ans os. Ans. What are functions? ‘A function refers to a named block of code that perform a specific task in a program, wich ‘executed when iti caled fom some other part of the progam. A large program can be broken down into smaller parts using functions, which help in making that program more readable. function needs to be declared oF defined in» program before using it in that program. You can create as many functions as you want in a program ‘rnd call them fromm any part of thot program. What are the two main advantages of using functions over scripts? Using functions in R provides the following two main advantages Le with diferent values. 2 Functions have the capability of working with variable inputs, 2 Functions return the result as an object. This result can be further used as an input for another function or program, Discuss the syntax of defining 2 function in R. “The syntax of defining a function in Ris as follows: <- Funct ion(arg1.arg2,-) c ‘stacenent 1 Statement Z return(output) y In the preceding syntax: 2 Refers o the name ofthe function. 2 function—Refers to the keyword used for defining a function. 2 (argl, arg2.-)-Refers to the set of arguments given toa function. A function.can have one ‘or more arguments; however it i not necessary to specify arguments in a function. The “arguments of a function are separated by commas and are enclosed withir. parentheses. Dots given in braces are known as “dot arguments”, which means we can ada the third argument that will be other than argl and arg? during the runtime, 2 Statement 1, Statement 2 ..and return statement—Define the body of the function and are always written within curly braces. a Return(output)—Calls the output so that it can return to the main screen. For example, in the code fragment (given below), print () is the function used to return the output variable or the value held by the variable “b”. oprint Crea of circle after removing r=" Sprint(b)//"this will give output which is held by “b” 657 haere QL What are arguments? Why are they used In functions? Ane Anguments cain hw dotnet as variables, constants, oF expressions that contain values which wv be wsexl in functions, Ax diseused earlier, «function may or may not have argument, The arguments of @ function receive the value from the user at runtime. These values can fe “ia funtion foe computational purposes QS. What isthe difference between optional and required arguments? Ans The arguments that already contain a default value are known as optional arguments because 1 is optional for a user to define its values, The required arguments, on the other hand, ay ale needs to be specified while calling a function, thase anguments whose What do you understand by the local and global environment of a function? The workspace of K fs known as the global environment in which the function operates, ‘whereas, the environment used in a function is known as the local environment of that ‘function. The local environment is the subset of the global environment. In other words, you can say that local environment is nested within the global environment. The environment ‘outside the function is known as its global environment, while the environment inside the function is known as its local environment, 24 Performing Graphical Analysis in R LT ‘Saving Graphs to External Files Advanced Features of R 683 So time series pro's, pre el OO "| sey wanable by using the pairs () funtion and cop tot 0) function. This chapter alsg g for «out design and bubble plots. In the end, this chapter discusses about saving graphs in ext es ls Using Plots R provides excellent graphics and plotting capabilities. It can efficiently display infor, graphically with the help of its “plot (” commands and functions. The data displayer sirerent graphs can be read from files or can be entered directly. Symbols and colors are ys. graphs to differentiate between different datasets. Some useful functions used in plotting gran. " interacting with graphs are lines, points, abline, curve, text, rug, legend segments, arrows, ie : Jocator, identify, etc. Some of the common plots in R are: Poly, a. Strip charts a Histograms 2 Time series plots a Scatter plots a a Index plots Bubble plots All the preceding graphs can be drawn by using the following techniques: 2 Using plots for a single variable 2 Using plots for two variables subjective Questions an Ane Ane. What do you understand by plotting in R7 A provides excellent graphics and plotting capabilities. It can efficiently display information graphically with the help of its plotting commands and functions. The data displayed onto different graphs can be read from files or con be entered directly. The symbols and colors are tised in graphs to differentiate between different datasets. Some useful functions used in plotting, graphs or interacting, with graphs are lines, points, abline, curve, text, rug, legend seginents, arrows, polygon locator, identify, etc. Some of the common plots in R are: Strip charts 2 Histograms Time series plots Scatter plots 4 Index plots 4 Bubble plots List the techniques used to draw graphs in R. Graphs in R can be drawn by using the following techniques: J Using plots for a single variable 4 Using, plots for two variables U Using plots for multiple variables 4 Designing special plots a, Chapter 34 Qh What isa histogram? jim comprises parallel vertical bare diaplaying, the trequerey Atria, Ane A ist quanistative variable in a graphical format, The area occupied by each bar i equg frequency of items searched. A histogram isthe best displaying mode to represent, data The values of response variables are distributed acrom the x axis and each cre is known as a bin, It tle tricky to plot a intogram, as ts plot requires sy. judgments for deciding where exactly to put the bin margins, Qk. Write a short note on time series plots. ‘Ars. provides the faiity of analyzing the time related data tn the time sere aph, the, Plotted against the time by plotting dots which are further joined to produce the yaph time series graph provides more accurate results when no data is missing, over» per, time. The limitation with time series graph is that it does not provide any informa the missed values. For example, the value sale for a particular month of a year may bern. or the sales values of many months may be missed during the last ten years but all tha ng ‘ot be reflected in the time series graph. The ts () and plot.() functions are used plotting the time series graph. The ts () function converts a numeric vecior into an cls + R time series. The syntax of using the t () function is as follows: ‘crane of object><-ts(vector, start, ends, frequencya) In the preceding syntax, start and end refer to the times of the first and last obseratis The frequency refers to the number of observations per unit time. In case of frequen denotes annual, 4 denotes quarterly, and 12 denotes monthly 5. What isa strip chart? ‘A strip chart plots the data in sequence along a line and each data point is specified as 3 bx | is mainly used when the sample size is too small, for example, when the observations sr less than 30. The main aim of using a strip chart is to carefully view the location of ex individual value present in the small sample, and to compare values across cases [% stripchart () function is used for plotting a strip chart. Ans,

You might also like