how Statistics relies on data.

Appreciate how Statistics relies on data.

2. Understand what data is and the purpose of collecting data.

3. Understand the process of data collection and the source of data.

4. Distinguish between Structured and Unstructured data.

5. Appreciate the usefulness of Structured data.

6. Create, download, and manipulate datasets.

7. Work on a subset of a dataset.

8. Identify variables and observations from the dataset and appreciate the difference between having no data and the case when data has value zero in the dataset.

9. Frame questions from the dataset relevant to the information being sought.

Statistics and Data: Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, presentation, and organization of data. It provides a way to draw conclusions from data and make informed decisions.
Understanding Data: Data is a collection of facts, such as numbers, words, measurements, observations or descriptions of things. The purpose of collecting data is to gather empirical evidence which can be analyzed to understand trends, patterns, or to test hypotheses.
Data Collection Process: Data collection involves gathering information from various sources. This could be through surveys, experiments, observations, or existing databases. The source of data could be primary (collected first-hand) or secondary (collected by someone else).
Structured and Unstructured Data: Structured data is highly organized and formatted in a way that it’s easily searchable in databases. Examples include Excel files or SQL databases. Unstructured data, on the other hand, is not organized in a pre-defined manner, and includes formats like text files, images, audio files, etc.
Usefulness of Structured Data: Structured data is easier to analyze and is often more immediately useful. It can be easily entered, stored, queried, and analyzed. It’s typically used in machine learning models and other data analytics tools.
Creating and Manipulating Datasets: Datasets can be created manually or downloaded from various sources. Tools like Python’s pandas library can be used to manipulate datasets, such as cleaning data, handling missing values, and transforming data.
Working on a Subset of a Dataset: Often, we might not need to use the entire dataset for our analysis. In such cases, we can create a subset of the dataset that includes only the relevant variables or observations.
Identifying Variables and Observations: Variables are the characteristics that we measure or observe (like age, gender, income, etc.). Observations are the individual entities that we’re studying (like people, countries, etc.). It’s important to distinguish between having no data (missing values) and a data value of zero, as they mean different things.
Framing Questions from the Dataset: Once we have a clean, organized dataset, we can start asking questions relevant to the information we’re seeking. This could be simple descriptive questions (like “What is the average income?”) or more complex inferential questions (like “Is there a relationship between education and income?”).
Congratulations on scoring full marks! Here’s the explanation for each question:
1. Structured Data in a Spreadsheet: In a spreadsheet, each observation (or case) is usually represented by a row, and each variable (or attribute) is represented by a column. For each variable, the same type of value is recorded for each case. So, all three statements are correct.
2. Type of Data for Song Lyrics: Song lyrics are considered unstructured data because they don’t follow a predefined model or format.
3. Non-variable Data: In the given dataset, ‘Locality’ is not a variable because it remains constant for all observations (all families live in the same locality).
4. Cases and Variables for Temperature and Humidity Data: In this scenario, each time point (e.g., 9 AM, 9:30 AM, 10 AM) is a case, and the variables are temperature and relative humidity.
5. Examples of Unstructured Data: Web pages, video files, and image files are all examples of unstructured data. Phone numbers and email addresses, on the other hand, are structured data because they follow a specific format.
6. True Statements about Data: Unstructured data cannot be used directly to perform statistical analysis. This is because it doesn’t follow a predefined model or format, making it difficult to analyze without further processing.
7. Type of Data in Table 1.2.2: The data in Table 1.2.2 is structured because it follows a specific format where each observation (student) has its own row and each variable (name, gender, amount of time, favorite artist) has its own column.
8. Variables and Observations in Table 1.2.2: In this dataset, each student is an observation, and the variables are name, gender, amount of time, and favorite artist.
9. Statements about Aakash’s Painting Time: Statement 2 is correct. We cannot make any conclusion about Aakash because data is not available to us. Statement 1 is incorrect because we don’t have any data indicating that Aakash spends 0 hours on painting.
I hope this helps! If you have any more questions, feel free to ask.
Classify the variables in a dataset as Categorical and Numerical.
2. Understand that the categorical variable identifies group membership.
3. Understand that a numerical variable describes the numerical properties of the observations and that the data must share a common unit.
4. Further, classify numerical variables as discrete and continuous.
5. Understand time series versus cross-sectional data.
Sure, let’s break down these concepts:
Classifying Variables: Variables in a dataset can be classified as either categorical or numerical.
Categorical Variables: These are variables that can be divided into multiple categories but having no order or priority. E.g., Gender (Male/Female), Yes or No questions, etc.
Numerical Variables: These are variables that contain numerical values. E.g., Age, Salary, etc.
Categorical Variable Group Membership: A categorical variable identifies group membership. That is, it tells us which category or group an observation belongs to.
Numerical Variable Properties: A numerical variable describes the numerical properties of the observations. The data for a numerical variable must share a common unit to be comparable. For example, if we have a variable ‘Height’, it could be measured in units like centimeters or inches.
Discrete and Continuous Numerical Variables: Numerical variables can further be classified as discrete or continuous.
Discrete Numerical Variables: These are variables that have a countable number of values. E.g., Number of pets a person has.
Continuous Numerical Variables: These are variables that can take on any value within a certain range. E.g., Height, Weight, etc.
Time Series vs Cross-Sectional Data:
Time Series Data: This is a set of observations on a variable or multiple variables over time. E.g., Daily temperature readings.
Cross-Sectional Data: This is data collected on multiple entities at the same or approximately the same point in time. E.g., A survey of households to collect data on their income, number of members, etc.
I hope this helps! If you have any more questions, feel free to ask.

Search This Blog

ajaydev_in

how Statistics relies on data.

Comments

Post a Comment

Popular posts from this blog

BA3rd , Sem. VI, Course I (Theory) Subject: Education

where is power among humans

what i think date wise