Data Science 101: What It Is, What Data Scientists Do, and Real World Examples

What if I told you that data science could be used in almost every — if not every — industry you could think of? Healthcare? Check. Finance? Check. The government? Check. If there’s data to be collected, data science can and should be used. Artificial intelligence expert Andrew Ng (founder of DeepLearning.AI) seems to agree. Data science is a process that uses AI principles and Ng believes that “the shift to data-centric AI is the most important shift businesses need to make today to take full advantage of artificial intelligence.” To put it plainly, data and data science isn’t going anywhere, especially with the continued rise of AI.

So, what is data science? Keep reading to learn more about data science, its process, what data scientists do, and how they bring their applications into the real world.

Let’s dive in!

Table of Contents

woman in glasses with hand on face, thinking

Is Tech Right For you? Take Our 3-Minute Quiz!

You Will Learn: If a career in tech is right for you What tech careers fit your strengths What skills you need to reach your goals

Take The Quiz!

What is Data Science?

Data science is the study of data to organize, analyze, and interpret information. Companies and organizations hire data scientists to uncover relevant insights about their business to help guide their strategy. They use data to find patterns and make predictions that can influence decisions like product planning, marketing, trend forecasting, and much more.

At first glance, the term “data science” might seem one-dimensional. It’s data and nothing else, right? Wrong. Data science is a multidisciplinary approach that combines different practices — mathematics, statistics, artificial intelligence, computer science, and advanced analytics techniques like machine learning algorithms, deep learning, and predictive modeling.

The Oxford Learner’s Dictionaries describes data as “facts or information, especially when examined and used to find out things or to make decisions.” When you think — really think — about what data is, it’s easy to come to the conclusion that the world runs on data and that we use the process of data science to make even the smallest decisions.

Why is Data Science Important?

Imagine you’re on the hunt for a new smartphone. You’re weighing the options of upgrading to a newer model that came out a year ago or buying the latest version. Research leads you to the cost of each phone, its features, and tech specifications. You keep track of all the information in a document that you periodically update so you can compare the two phones. You take some time to consider which is best for you before deciding to go with the latest model. To you, all you did was pick a new phone. To me, you used data science to make your decision.

Choosing a new phone seems inconsequential when you compare it to the potential that data science has in industries like healthcare, travel, cybersecurity, and more. Women’s health apps use data science to track menstrual cycles and ovulation schedules which can help women get pregnant. Data science is at the heart of Air Traffic Control where they use traffic flow and weather data to suggest flight routes, predict traffic congestion, and reduce delays.

Data is everywhere, and data science is what we use to make sense of it.

The Data Science Life Cycle

To get to the point in the data science process where you can use data to make predictions and decisions, you have to go through the data science life cycle. Depending on who you ask, there are differing numbers in the steps, but if you ask me, the cycle can be neatly tied into five steps — collection (data capturing), warehousing (data maintenance), mining (data processing), exploration and confirmation (data analysis), and reporting (data communication).

Step 1. Data Collection

The data science life cycle begins with data collection. It starts with identifying sources — databases, APIs, online platforms, surveys, etc. — and pulling structured and unstructured data from those sources. Think of customer information (names, addresses, credit card numbers) as structured data and pictures and videos as examples of unstructured data. Not only does the data acquisition process include data entry, but it also calls for data extraction like web scraping — a method of extracting data from websites. Depending on the source, data scientists need different tools for extraction. For example, SQL queries are ideal for databases while Python scripts are pretty universal with their ability to extract data from databases, APIs, websites, and CSV files.

Step 2. Data Warehousing

The purpose of a warehouse is to store products, equipment, etc. In the data science life cycle, once you’ve collected and extracted data, it needs to be stored and maintained in a data warehouse. Before it makes its way to the warehouse, the data needs to be cleaned and integrated. Data could come in dozens of different formats — text (TXT, HTML, XML), numeric, multimedia (JPEG, PNG, MP3, MP4), tabular data (XML, Excel), and more. Cleaning will get rid of any errors while ETL (extraction, transform, load) jobs “combine data from multiple sources into a single, consistent data set for loading into a data warehouse.”

Step 3. Data Mining (or Processing)

Data mining is the process of processing data to identify patterns and relationships within large data sets. This stage requires statistics, data analytics, and machine learning techniques, like classification and clustering, to organize labeled data into categories or group unlabeled data into “clusters” based on how they correlate. This grouping allows the data to be further analyzed during the next step of the data science life cycle.

Step 4. Data Exploration and Confirmation

The data analytics and machine learning needed during processing make a reappearance during the data analysis phase. Instead of classification, one type of supervised machine learning model, we see regression — a technique for understanding relationships between variables. Combined with predictive and qualitative analytics, data scientists analyze the data to explore patterns, make predictions, and guide decision-making.

Step 5. Data Reporting

The last step in the data science life cycle is for data scientists to communicate their findings. There are multiple ways to go about presenting this information, but the most common is through reports and data visualization. Data visualization is the representation of data through visual graphics—charts, graphs, maps, dashboards, etc. This makes it possible for stakeholders—who likely have no idea about the data science process—to understand the information and transform the data insights into actionable business decisions.

woman in glasses with hand on face, thinking

Is Tech Right For you? Take Our 3-Minute Quiz!

You Will Learn: If a career in tech is right for you What tech careers fit your strengths What skills you need to reach your goals

Take The Quiz!

What is a Data Scientist and What Do They Do?

I bet you didn’t know that a career in data science is all math, computer engineering, AI, and—sex appeal? It is if you ask Harvard Business School professionals who named data scientists the sexiest job of the 21st century. What’s even sexier? The median annual salary for data scientists is $103,500.

As previously mentioned, data science is a multidisciplinary approach that uses the principles of multiple practices and boils them down into one job. So what is a data scientist and what do they do? Harvard Business Review thought this was a question best answered by data scientists, and I happen to agree. Here’s how they described the job of a data scientist:

“First, data scientists lay a solid data foundation in order to perform robust analytics. Then they use online experiments, among other methods, to achieve sustainable growth. Finally, they build machine learning pipelines and personalized data products to better understand their business and customers and to make better decisions. In other words, in tech, data science is about infrastructure, testing, machine learning for decision making, and data products.”

You could consider this the general scope of data science jobs, but their day-to-day responsibilities might look like:

  • Identifying valuable and relevant data sources
  • Collecting structured and unstructured data
  • Performing tests to assess and improve the data collection process
  • Building predictive models and machine learning algorithms
  • Analyzing data to identify trends and patterns
  • Communicating insights and recommendations to stakeholders based on data analysis

Skills and Tools of a Data Science

For a successful career, data scientists need well-rounded and highly technical skills. The data science skills, frameworks, and technologies needed to work with large quantities of data include:

  • Mathematical skills (linear algebra, linear regression, and statistical analysis)
  • Programming languages (Python, R)
  • Big data technologies (Apache Hadoop and Apache Spark)
  • Experience in data mining
  • Database management and tools (SQL, NoSQL, Microsoft Excel)
  • Machine learning techniques (Linear regression, logistic regression, decision trees. etc.)
  • Experience working with APIs
  • Data visualization (Tableau, Google Charts, Microsoft Power BI, D3.js)
  • Soft skills: Problem-solving, attention to detail, communication

Data scientists aren’t the only tech professionals who need these skills and qualifications. Other data science careers that you’d find tons of overlap with include data analysts, data engineer, machine learning scientists, data architects, and business intelligence developers.

📌 Related: To learn more, check out our blog post all about how to become a data scientist.
woman in glasses with hand on face, thinking

Is Tech Right For you? Take Our 3-Minute Quiz!

You Will Learn: If a career in tech is right for you What tech careers fit your strengths What skills you need to reach your goals

Take The Quiz!

Data Science – In the Real World

Data science is an everyday process, whether you’re a data scientist or not. Remember our earlier example of choosing a smartphone. This is data science on a small scale. When data scientists go through the data science life cycle, they’re making waves in healthcare, finance, and the government.

Here are some real-life examples of data science to consider:

Healthcare

Data science is used for tracking and preventing diseases, developing new medications, tracking menstrual cycles, developing vaccines, and one of its biggest uses—medical imaging. Imagine a dancer comes into a doctor’s office with a hairline fracture in their foot. These small fractures can be difficult to see with the human eye. Data science makes it possible to develop technologies that can scan these images and pinpoint even the smallest irregularities.

Finance

In the finance industry, data science can be used to analyze the market, detect fraud, forecast financial trends, allocate loans, and manage risks. If you apply for a loan, financial institutions go through a long process for approval. It may take minutes (or even seconds) on your end, but they automate their systems to go through tons of data to identify risks. For example, credit card companies may use data science to look into your financial background and even your social media (yes, these programs may be sifting through your Facebook statuses from 10 years ago) to figure out whether you’re trustworthy and likely to make payments on your account.

Government

The government uses data science in matters like law enforcement, national defense, and tax evasion. One of the more interesting uses of data science is its place in emergency response. In emergencies, data science provides real-time analytics—location, population density, time of day, weather conditions, and more—to help the government optimize its resources, communicate with the public, and mitigate further risks, if necessary. In the event of a big storm, this might look like the government using data science technologies that pull data from the news and social media to plan and prioritize which areas need disaster relief the most.

The Future of Data Science

Data science is a process that lets you take large data sets and make sense of them. Data scientists use their skills—programming languages, machine learning techniques, and data mining experience—and data science tools—big data technology, databases, and data visualization programs—to analyze data and make predictions so businesses can improve their products and services. And while this sounds good for businesses, it’s good for the greater population. Companies are becoming more data-driven, and when data science is used in finance, transportation, and healthcare, it has the potential to be life-changing and life-saving.

Do you think data science is sexy now? I mean, what’s sexier than a field that can positively impact everyone’s lives? And sure, the six-figure median salary doesn’t hurt either. If you’re interested in a career in data science, the time to start is now. With Skillcrush’s Break Into Tech program, you can learn the fundamentals of web development that will help you prepare for a successful future in data science.

woman in glasses with hand on face, thinking

Is Tech Right For you? Take Our 3-Minute Quiz!

You Will Learn: If a career in tech is right for you What tech careers fit your strengths What skills you need to reach your goals

Take The Quiz!

Author Image

Jouviane Alexandre

After spending her formative years in the height of the Internet Age, Jouviane has had her fair share of experience in adapting to the inner workings of the fast-paced technology industry. Note: She wasn't the only 11-year-old who learned how to code when building and customizing her MySpace profile page. Jouviane is a professional freelance writer who has spent her career covering technology, business, entrepreneurship, and more. She combines nearly a decade’s worth of experience, hours of research, and her own web-building projects to help guide women toward a career in web development. When she's not working, you'll find Jouviane binge-watching a series on Netflix, planning her next travel adventure, or creating digital art on Procreate.