The First Course on R – Foundations and Data Structures

— The post is part of my presentation for the Computational Biology Workshop for Clinicians (March 6, 2025) at AIIMS Kalyani.

1: Variables

Let's begin by using R as a basic calculator to sum the following numbers:

10 + 2 + 4 + 6 + 9 + 4 + 3 + 2

However, if you need to add another number (e.g., 10) and your list is extensive, manually rewriting the entire expression can be inefficient—especially if you do not remember the previous result.

10 + 2 + 4 + 6 + 9 + 4 + 3 + 2 + 10

This approach is not practical for larger datasets. A more efficient solution is to store the sum in a variable (think of it like a bag that stores the data) and perform operations on it:

x <- 10 + 2 + 4 + 6 + 9 + 4 + 3 + 2

x # Displays the stored sum

y <- x + 10 # Adds 10 to the previously computed sum

y # Displays the updated sum

By using variables, you can avoid redundant calculations and make your code more manageable.

2. R Objects

Continuing with our analogy of variables as bags that store data, the object is the actual stuff inside the bag—it could be numbers, words, lists, or even entire tables of data. A variable is just a label for the bag, helping you find it later, but the real data lives inside the object. In R, data can be stored in different types of objects, like vectors, lists, matrices, and data frames, each designed to hold specific types of information efficiently. Just like different bags are used for different purposes (backpack for clothes, briefcase for documents), R provides different objects to organize and manage data effectively.

2.1. Vectors

Vectors are the simplest R objects, containing elements of the same data type. In R, vectors can store different types of data. Let's examine various data types and their associated vectors.

2.1.1. Numeric

Numeric values include both integers and decimals (floating-point numbers).

a <- 10 + 2 + 4

class(a) # Returns "numeric"

2.1.2. Integer

To explicitly define an integer, use the L suffix. This ensures the value is treated as an integer rather than a numeric (floating-point) value.

b <- 10L + 2L + 4L # 'L' specifies whole numbers without decimals

class(b) # Returns "integer"

2.1.3. Logical

Logical data types represent boolean values: TRUE or FALSE.

c <- TRUE

class(c) # Returns "logical"

2.1.4. Complex

Complex numbers in R include a real and an imaginary part, denoted by i.

d <- 1i + 2i + 3i # Complex numbers

class(d) # Returns "complex"

2.1.5. Character

Character data consists of text or string values, enclosed in single (') or double (") quotes.

e1 <- '10+2+4'

e2 <- "10L+2L+4L"

e3 <- 'TRUE'

e4 <- "1i+2i+3i"

class(e1) # Returns "character"

class(e2) # Returns "character"

class(e3) # Returns "character"

class(e4) # Returns "character"

2.1.6. Raw

Raw data type is used to store raw bytes. It is rarely used in standard data analysis but can be useful for handling binary data.

f <- raw(10) # Creates a raw vector of length 10

class(f) # Returns "raw"

2.2. Lists

Lists in R are versatile data structures that can store elements of different types, including numbers, characters, vectors, and even other lists.

2.2.1. Creating a List

A list can contain elements of various data types, including named components.

my_list <- list(name = "John", age = 25, grades = c(90, 85, 78))

2.2.2. Accessing List Elements

Elements in a list can be accessed using either double square brackets ([[ ]]) or the $ operator.

print(my_list[["name"]]) # Accessing by name using double brackets

print(my_list$age) # Accessing by name using the $ operator

2.2.3. Creating a Heterogeneous List

Lists can contain different data types within the same structure.

mixed_list <- list("John", 25.5, c(90, 85, 78))

# The function c() combines multiple numeric values into a vector

2.2.4. Creating a Nested List

Lists can also be nested, meaning a list can contain another list as an element.

nested_list <- list(inner_list = list(a = 1, b = 2), c = 3)

2.2.5. Adding Elements to a List

You can append new elements to an existing list using the c() function.

my_list2 <- c(my_list, city = "New York")

2.2.6. Converting a List to a Data Frame

If the list elements have the same length, you can convert the list into a data frame for easier tabular manipulation.

my_data_frame <- data.frame(my_list)

2.3. Matrices

Matrices in R are two-dimensional data structures where elements are arranged in rows and columns. They are primarily used for numerical computations and support various mathematical operations.

2.3.1. Creating a Matrix

A matrix can be created using the matrix() function by specifying the data, number of rows (nrow), and number of columns (ncol).

my_matrix <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3)

2.3.2. Accessing Matrix Elements

Matrix elements can be accessed using row and column indices.

print(my_matrix[2, 3]) # Accesses the element in the second row, third column

2.3.3. Assigning Row and Column Names

To enhance readability, row and column names can be assigned to a matrix.

colnames(my_matrix) <- c("A", "B", "C")

rownames(my_matrix) <- c("X", "Y", "Z")

2.3.4. Performing Matrix Operations

Matrices support arithmetic operations such as addition, subtraction, multiplication, and division.

matrix_a <- matrix(c(1, 2, 3, 4), nrow = 2)

matrix_b <- matrix(c(5, 6, 7, 8), nrow = 2)

result_matrix <- matrix_a + matrix_b # Element-wise addition

2.3.5. Matrix Functions

Several built-in functions allow manipulation and analysis of matrices.

print(dim(my_matrix)) # Returns the dimensions (rows and columns)

print(t(my_matrix)) # Computes the transpose of the matrix

2.4. Array

An array in R is a multi-dimensional data structure that can store elements of the same data type, typically arranged in two or more dimensions. Arrays are useful for handling complex datasets and performing multi-dimensional computations.

2.4.1. Creating an Array

An array is created using the array() function, where the data parameter specifies the elements, and the dim parameter defines the dimensions.

my_array <- array(data = c(1, 2, 3, 4, 5, 6, 7, 8, 9), dim = c(3, 3, 1))

This creates a 3×3 array with a single layer (depth of 1).

2.4.2. Accessing Array Elements

Elements within an array can be accessed using indices specifying the row, column, and depth.

print(my_array[2, 3, 1]) # Retrieves the element at row 2, column 3, depth 1

2.4.3. Performing Array Operations

Arrays support arithmetic operations, such as element-wise addition, subtraction, multiplication, and division.

array_a <- array(c(1, 2, 3, 4), dim = c(2, 2, 1))

array_b <- array(c(5, 6, 7, 8), dim = c(2, 2, 1))

result_array <- array_a + array_b # Element-wise addition

2.4.4. Array Functions

Several functions help analyze and manipulate arrays.

print(dim(my_array)) # Returns the dimensions (rows, columns, depth)

print(length(my_array)) # Returns the total number of elements

2.5. Factors

Factors in R are used to represent categorical data efficiently. They store both the values and the corresponding levels, making them ideal for handling qualitative data such as gender, education level, or survey responses.

2.5.1. Creating a Factor with Specified Levels

A factor can be created using the factor() function, where the levels argument explicitly defines the categories.

gender <- factor(c("Male", "Female", "Male", "Female"), levels = c("Male", "Female"))

2.5.2. Checking and Modifying Factor Levels

The levels() function allows us to view the categories in a factor.

print(levels(gender)) # Displays the defined levels

2.5.3. Ordering Factor Levels

Factors can also be ordered, which is useful when representing ranked data.

grade <- factor(c("A", "B", "C"), levels = c("C", "B", "A")) # Ordering from lowest to highest

print(levels(grade))

2.5.4. Summarizing Factor Data

Factors can be summarized using the summary() function, which provides a count of occurrences for each level.

summary(gender)

Using factors ensures efficient storage and proper handling of categorical variables in R, making them crucial for statistical analysis and data visualization.

2.6. Data frame

A data frame in R is a structured table-like object used for storing and managing datasets. Unlike matrices, which usually holds numerical data; a data frame can hold different types of data, including numerical, character, and categorical variables.

2.6.1. Creating a Data Frame

The data.frame() function is used to construct a data frame with multiple columns of different data types.

student_data <- data.frame(

Name = c("Alice", "Bob", "Charlie"),

Age = c(22, 25, 21),

Grade = c("A", "B", "C")

)

2.6.2. Accessing Data Frame Elements

Specific elements can be accessed using either column names or row-column indexing.

print(student_data$Name) # Accessing the 'Name' column

print(student_data[2, 3]) # Accessing the element in the second row, third column

2.6.3. Checking and Modifying Column Names and Data Types

The names() and str() functions provide insights into the structure of a data frame.

print(names(student_data)) # Displays the column names

print(str(student_data)) # Shows the internal structure of the data frame

2.6.4. Summarizing Data Frame Contents

The summary() function provides a statistical summary of numeric columns and frequency counts for categorical columns.

summary(student_data)

2.6.5. Adding and Removing Columns

New columns can be added dynamically, and existing columns can be removed using indexing.

# Adding a new column

student_data$City <- c("New York", "San Francisco", "Chicago")

# Removing a column

student_data <- student_data[, -4] # Removes the fourth column (City)

2.6.6. Adding and Removing Rows

Rows can be added to a data frame using the rbind() function, and unwanted rows can be removed by subsetting the data frame.

Adding a Row

To append a new row, ensure that the new row has the same column structure as the existing data frame.

new_student <- data.frame(Name = "David", Age = 23, Grade = "B")

student_data <- rbind(student_data, new_student)

Removing a Row

Rows can be removed using negative indexing.

student_data <- student_data[-2, ] # Removes the second row

2.6.7. Subsetting a Data Frame

A subset of the data can be extracted using logical conditions.

young_students <- student_data[student_data$Age < 25, ]

2.6.8. Updating Data in a Data Frame

Modifying values in a data frame can be done by directly assigning new values to specific elements, rows, or columns.

Updating a Specific Value

A particular cell in the data frame can be updated using row and column indexing.

student_data[1, 2] <- 23 # Updates Alice's Age to 23

Updating an Entire Column

A whole column can be modified by assigning new values.

student_data$Grade <- c("A+", "B+", "C+", "B") # Updates all grade

Updating Multiple Rows Based on a Condition

Rows that satisfy a condition can be updated efficiently.

student_data$Age[student_data$Name == "Charlie"] <- 22 # Updates Charlie's Age to 22

2.6.9. Row and Column Names in a Data Frame

Row and column names in a data frame provide meaningful labels, improving data interpretation and ease of access.

Checking Row and Column Names

You can retrieve the names of rows and columns using the rownames() and colnames() functions.

print(rownames(student_data)) # Displays row names

print(colnames(student_data)) # Displays column names

Setting Column Names

Column names can be modified using the colnames() function.

colnames(student_data) <- c("Student_Name", "Student_Age", "Student_Grade")

Changing a Specific Column Name

To rename a single column, modify the corresponding index within colnames().

colnames(student_data)[2] <- "Age_in_Years" # Renames the second column

Setting Row Names

Row names can be assigned using the rownames() function.

rownames(student_data) <- c("S1", "S2", "S3")

Changing a Specific Row Name

To rename a specific row, update the corresponding index in rownames().

rownames(student_data)[1] <- "Student_A" # Renames the first row

Removing Row and Column Names

To remove row or column names, assign NULL.

rownames(student_data) <- NULL # Removes row names

colnames(student_data) <- NULL # Removes column names

3. R Operators

Operators in R are used to perform calculations, comparisons, and logical evaluations. The main types of operators include arithmetic, relational, and logical operators.

3.1. Arithmetic Operators

Arithmetic operators perform basic mathematical computations.

# Addition ('+')

result <- 5 + 3

print(result) # Output: 8

# Subtraction ('-')

result <- 5 - 3

print(result) # Output: 2

# Multiplication ('*')

result <- 4 * 6

print(result) # Output: 24

# Division ('/')

result <- 10 / 2

print(result) # Output: 5

# Exponentiation ('^' or '**')

result <- 4 ^ 3

print(result) # Output: 64

3.2. Relational Operators

Relational operators compare values and return a logical (TRUE or FALSE) output.

# Equal to ('==')

result <- 5 == 5

print(result) # Output: TRUE

# Not equal to ('!=')

result <- 3 != 7

print(result) # Output: TRUE

# Greater than ('>')

result <- 10 > 5

print(result) # Output: TRUE

# Less than ('<')

result <- 3 < 8

print(result) # Output: TRUE

# Greater than or equal to ('>=')

result <- 6 >= 6

print(result) # Output: TRUE

# Less than or equal to ('<=')

result <- 6 <= 6

print(result) # Output: TRUE

3.3. Logical Operators

Logical operators are used to evaluate conditions and return Boolean (TRUE or FALSE) results.

# AND ('&' for element-wise, '&&' for single evaluation)

result <- FALSE & TRUE

print(result) # Output: FALSE

# OR ('|' for element-wise, '||' for single evaluation)

result <- TRUE | FALSE

print(result) # Output: TRUE

# NOT ('!')

result <- !TRUE

print(result) # Output: FALSE

4. Decision-Making in R

4.1. If-else Statements

Decision-making structures allow R programs to execute specific code blocks based on certain conditions. The if, if-else, and if-else if statements are commonly used for conditional execution.

4.1.1. Simple if Statement

The if statement executes a block of code only if a specified condition evaluates to TRUE.

# Example: Checking if x is greater than 5

x <- 10

if (x > 5) {

print("x is greater than 5")

}

4.1.2. if-else Statement

The if-else statement provides an alternative block of code that runs when the condition is FALSE.

# Example: Checking if y is greater than 5

y <- 3

if (y > 5) {

print("y is greater than 5")

} else {

print("y is not greater than 5")

}

4.1.3. if-else if Statement

The if-else if structure allows checking multiple conditions sequentially. The first condition that evaluates to TRUE executes, and the remaining conditions are ignored.

# Example: Checking if z is positive, negative, or zero

z <- 0

if (z > 0) {

print("z is positive")

} else if (z < 0) {

print("z is negative")

} else {

print("z is zero")

}

4.2. Switch Statement

The switch statement in R provides an efficient way to handle multiple conditions by evaluating an expression and executing the corresponding code block based on matching values. It is particularly useful when multiple conditions need to be checked against a single variable.

4.2.1. Using switch to Handle Multiple Conditions

# Example: Determining the type of day based on input

day <- "Monday"

switch(day,

"Monday" = {

print("It's the start of the week.")

"Wednesday" = {

print("It's the middle of the week.")

"Friday" = {

print("It's the end of the week.")

"Saturday" = {

print("It's the weekend.")

"Sunday" = {

print("It's the weekend.")

print("Invalid day.") # Default case if no match is found

)

Explanation

The switch function evaluates the value of day.
If day matches one of the specified cases (e.g., "Monday", "Wednesday"), the corresponding block of code is executed.
If no match is found, the default case executes (print("Invalid day.")).

The switch statement simplifies decision-making by reducing the need for multiple if-else conditions, making the code more readable and efficient.

5. Iteration in R

5.1. For Loops

The for loop in R is used to iterate over sequences, vectors, and other data structures, allowing efficient execution of repetitive tasks.

5.1.1. Iterating Over a Sequence of Numbers

# Looping through numbers 1 to 5

for (i in 1:5) {

print(i)

}

Explanation:

The loop iterates from 1 to 5, printing each value.

5.1.2. Iterating Over Elements of a Vector

# Looping through a character vector

fruits <- c("apple", "banana", "orange", "grape")

for (x in fruits) {

print(x)

}

Explanation:

The loop iterates over each element in the fruits vector and prints it.

5.1.3. Performing Calculations Within a Loop

# Summing elements of a vector

numbers <- c(2, 4, 6, 8, 10)

result <- 30 # Initial value

for (num in numbers) {

result <- result + num

}

print(result)

Explanation:

The loop iterates through the numbers vector, adding each value to result.

5.1.4. Nested for Loops

# Example of nested loops

for (i in 1:3) {

for (j in 1:2) {

print(paste("i =", i, ", j =", j))

}

Explanation:

The inner loop runs completely for each iteration of the outer loop.
The paste() function combines i and j values into a formatted string.

5.2. While Loops

The while loop in R is used for executing a block of code repeatedly as long as a specified condition remains TRUE. It is particularly useful when the number of iterations is not known in advance.

5.2.1. Simple Counting Using a while Loop

# Counting from 1 to 5 using a while loop

count <- 1

while (count <= 5) {

print(count)

count <- count + 1

}

Explanation:

The loop continues executing as long as count is less than or equal to 5.
The count variable is incremented in each iteration to prevent an infinite loop.

5.2.2. Summing Numbers Using a while Loop

# Summing elements of a vector using a while loop

numbers <- c(2, 4, 6, 8, 10)

sum_result <- 0

index <- 1

while (index <= length(numbers)) {

sum_result <- sum_result + numbers[index]

index <- index + 1

}

print(sum_result)

Explanation:

The loop iterates through the numbers vector, adding each element to sum_result.
The index variable ensures all elements are processed sequentially.

5.2.3. User Input Validation Using a while Loop

# Validating user input to ensure it falls within a specific range

user_input <- -1

while (user_input < 0 || user_input > 10) {

cat("Enter a number between 0 and 10: ")

user_input <- as.numeric(readline())

}

print(paste("You entered:", user_input))

Explanation:

The loop repeatedly prompts the user until they enter a valid number between 0 and 10.
The readline() function takes user input, which is converted to numeric for validation.

5.2.4. Handling Infinite Loops with a Break Statement

# Infinite loop with a break condition

count <- 1

while (TRUE) {

print(count)

count <- count + 1

if (count > 5) {

break # Exits the loop when count exceeds 5

}

Explanation:

The while (TRUE) construct creates an infinite loop.
The break statement ensures the loop exits once count exceeds 5.

6. Functions in R

Functions in R allow for code reusability and modular programming. They enable users to encapsulate logic, making code more readable and efficient. Functions can accept arguments, return values, and have default parameters.

6.1. Creating a Simple Function

# Function to compute the square of a number

square <- function(x) {

return(x^2)

}

# Calling the function

result <- square(6)

print(result)

Explanation:

The function square() takes a single argument x and returns its square.
It is then called with the value 6, and the result is printed.

6.2. Function with Multiple Arguments

# Function to calculate the sum of squares of two numbers

sum_of_squares <- function(a, b) {

return(a^2 + b^2)

}

# Calling the function

result <- sum_of_squares(3, 4)

print(result)

Explanation:

The function sum_of_squares() takes two arguments, a and b, and returns the sum of their squares.

6.3. Function with Default Arguments

# Function to compute the power of a number with a default exponent

power <- function(x, exponent = 2) {

return(x^exponent)

}

# Calling the function with and without specifying the exponent

result1 <- power(3) # Defaults to exponent 2

result2 <- power(3, 3) # Exponent explicitly set to 3

print(result1)

print(result2)

Explanation:

The function power() computes x raised to a given exponent.
If no exponent is provided, it defaults to 2.

6.4. Returning Multiple Values

# Function to compute summary statistics of a dataset

calculate_stats <- function(data) {

mean_val <- mean(data)

median_val <- median(data)

sd_val <- sd(data)

return(list(mean = mean_val, median = median_val, sd = sd_val))

}

# Calling the function

data <- c(1, 2, 3, 4, 5)

result <- calculate_stats(data)

# Accessing elements from the returned list

print(result$mean)

print(result$median)

print(result$sd)

Explanation:

The function calculate_stats() takes a numeric vector as input.
It calculates and returns a list containing the mean, median, and standard deviation of the data.

7. Strings in R

R provides extensive functionality for string manipulation, including creation, concatenation, indexing, length measurement, case conversion, pattern matching, and replacement.

7.1. Creating and Concatenating Strings

# Defining strings using single and double quotes

my_string1 <- 'Hello, World!'

my_string2 <- "R programming"

# Concatenating strings using paste()

combined_string <- paste(my_string1, my_string2)

print(combined_string)

Explanation:

Strings can be defined using either single or double quotes.
The paste() function combines multiple strings into a single string.

7.2. Accessing Characters in a String

# Accessing individual characters

first_char <- substr(my_string1, 1, 1)

second_char <- substr(my_string1, 2, 2)

print(first_char)

print(second_char)

Explanation:

The substr() function allows accessing specific characters by specifying the start and stop positions.

7.3. Measuring String Length

# Getting the length of a string

length_of_string <- nchar(my_string1)

print(length_of_string)

Explanation:

The nchar() function returns the total number of characters in a string, including spaces and punctuation.

7.4. Extracting Substrings

# Extracting a portion of a string

substring_example <- substr(my_string1, start = 1, stop = 5)

print(substring_example)

Explanation:

The substr() function extracts a substring from a given start to stop position.

7.5. Changing Case

# Converting to uppercase and lowercase

uppercase_string <- toupper(my_string1)

lowercase_string <- tolower(my_string1)

print(uppercase_string)

print(lowercase_string)

Explanation:

The toupper() function converts a string to uppercase.
The tolower() function converts a string to lowercase.

7.6. Searching and Matching Strings

# Searching for a pattern in a string using grep()

matching_values <- grep("World", my_string1, value = TRUE, ignore.case = TRUE)

print(matching_values)

Explanation:

The grep() function searches for a pattern in a string and returns matches.

7.7. Replacing Substrings

# Replacing a substring using gsub()

modified_string <- gsub("Hello", "Hi", my_string1)

print(modified_string)

Explanation:

The gsub() function replaces all occurrences of a pattern with a specified replacement.

7.8. Comparing Strings

# Comparing two strings

comparison_result <- my_string1 == my_string2

print(comparison_result)

Explanation:

The == operator checks whether two strings are identical and returns TRUE or FALSE.

7.9. Combining Strings with Numbers

# Combining a string with a numeric value

age <- 25

info_string <- paste("My age is", age, "years.")

print(info_string)

Explanation:

The paste() function seamlessly combines strings and numeric values.

8. Regular Expression in R

Regular expressions (RegEx) are powerful tools for pattern matching and text manipulation in R. The grep() function is commonly used to search for patterns in character vectors. Below are key RegEx concepts with examples.

8.1. Metacharacters: Basic Pattern Matching

Metacharacters allow pattern searching based on specific substrings.

my_string1 <- c("apple", "banana", "orange", "grape")

# Matching strings that contain "app"

matching_values <- grep("app", my_string1, value = TRUE, ignore.case = TRUE)

print(matching_values)

Explanation:

This searches for occurrences of "app" within the character vector, ignoring case sensitivity.

8.2. Square Brackets [ ]: Match Any One Character Inside the Brackets

# Matching strings containing "g" or "r"

matching_values <- grep("[gr]", my_string1, value = TRUE, ignore.case = TRUE)

print(matching_values)

Explanation:

The pattern [gr] matches any string containing either "g" or "r".

8.3. Ranges -: Match Any One Character Within a Specified Range

# Matching strings containing any letter from "o" to "s"

matching_values <- grep("[o-s]", my_string1, value = TRUE, ignore.case = TRUE)

print(matching_values)

Explanation:

The pattern [o-s] matches strings that contain any character within the range "o" to "s".

8.4. Quantifiers: Controlling the Number of Occurrences

Quantifiers specify how many times a character or pattern should appear.

Symbol Meaning

* 0 or more occurrences

+ 1 or more occurrences

? 0 or 1 occurrence

{n} Exactly n occurrences

{n,} n or more occurrences

{n,m} Between n and m occurrences

# Matching strings with two or more occurrences of 'p'

matching_values <- grep("p{2,}", my_string1, value = TRUE, ignore.case = TRUE)

print(matching_values)

Explanation:

The pattern "p{2,}" ensures that only strings with at least two consecutive "p"s are matched.

8.5. Anchors: Matching Start (^) and End ($) of a String

# Matching strings that start with "gr"

matching_values <- grep("^gr", my_string1, value = TRUE, ignore.case = TRUE)

print(matching_values)

Explanation:

The caret (^) anchors the pattern to the beginning of the string.

# Matching strings that end with "a"

matching_values <- grep("a$", my_string1, value = TRUE, ignore.case = TRUE)

print(matching_values)

Explanation:

The dollar sign ($) anchors the pattern to the end of the string.

8.6. Escaping Special Characters \\: Treating Symbols Literally

Some symbols have special meanings in RegEx (e.g., ., *, +). To use them literally, they must be escaped using \\.

# Matching a period (.)

my_string2 <- c("apple", "banana", "ora.nge", "grape")

matching_values <- grep("\\.", my_string2, value = TRUE, ignore.case = TRUE)

print(matching_values)

Explanation:

The pattern "\\." ensures that only strings containing an actual period (.) are matched.

8.7. Alternation |: Matching Multiple Patterns

# Matching strings containing either "ban" or "ora"

matching_values <- grep("ban|ora", my_string1, value = TRUE, ignore.case = TRUE)

print(matching_values)

Explanation:

The pipe (|) acts as an OR operator, matching either "ban" or "ora".

8.8. Real-Life Example: Extracting Phone Numbers

text_data <- c("This is a telephone directory",

"Call Sam at 123-456-7890",

"Office: 987-654-3210",

"No number here.")

# Extracting valid phone numbers

phone_numbers <- grep("\\d{3}-\\d{3}-\\d{4}", text_data, value = TRUE)

print(phone_numbers)

Explanation:

\\d matches any digit (0-9).
{3}- ensures that three digits are followed by a hyphen.
{4} enforces a four-digit sequence at the end, forming a standard phone number format (XXX-XXX-XXXX).

9. Data Input and Output in R

9.1. Checking and Setting the Working Directory

Before working with files, it is important to ensure that the correct working directory is set.

# Check the current working directory

getwd()

# Set a new working directory

setwd("C:/Users/username/Desktop/R")

The getwd() function returns the current working directory, while setwd() allows setting a specific location where files will be read from or saved to.

9.2. Reading Data from Files

Reading CSV Files

# Read a CSV file into a data frame

data <- read.csv("file.csv")

The read.csv() function loads comma-separated data into an R data frame. The first row is assumed to contain column names unless specified otherwise.

Reading Text Files with Custom Delimiters

# Read a text file with tab-separated values

data <- read.table("file.txt", header = TRUE, sep = "\t")

For non-CSV formats, read.table() provides more flexibility, allowing the user to specify custom delimiters such as tabs ("\t") or semicolons (";").

9.3. Writing Data to Files

Writing Data to a CSV File

write.csv(data, "output_file.csv", row.names = FALSE)

This function saves an R data frame to a CSV file. Setting row.names = FALSE prevents row indices from being written.

Writing Data to a Text File with a Custom Delimiter

write.table(data, "output_file.txt", sep = "\t", row.names = FALSE)

The write.table() function provides greater control over formatting by allowing custom separators such as tabs ("\t") or spaces.

Page updated

Google Sites

Report abuse