What is Data?

Data are values of qualitative or quantitative variables belongs to a set of items.

Set of items: The set of objects you are interested in.

Variables: A measurement or characteristic of an item.

Qualitative: Country of origin, sex, treatment

Quantitative: Height, weight, blood pressure

So, the first thing that you can see from this definition is that you need a set of items to be measuring things on and so, the set of items is sometimes called the population, in statistical inference. It’s basically, what you’re trying to discover something about, so, it might be the set of all websites or it might be the set of all people coming to websites, or it might be, a set of all people getting a particular drug. But in general, it’s a set of things that you’re going to make measurements on.

The data are actually the second most important thing in data science. The most important thing, in data science, is actually the question you’re trying to answer. So, the data should follow that question. It’s the second most important thing to the question. Often the data will limit, or enable the questions you’re trying to ask, so in other words, you start with the question, and you might not have the data to be able to answer that question, so you have to modify the question, to be able to answer, sort of a sub-question or a related question.

Types of  Data Science Questions

In their approximate order of difficulty

  • Descriptive.
  • Exploratory.
  • Inferential.
  • Predictive.
  • Casual.
  • Mechanistic.

For more information please check this file:

03_01_typesOfQuestions