Ad
Wednesday, October 4, 2023
Data Science for Internal Exam : TYBCS : Semester 5
Q. What is data science?
Answer: Data science is an interdisciplinary field that involves the use of various techniques, algorithms, processes and systems to extract valuable insights and knowledge from data.
It combines elements of computer science, mathematics, statistics, and domain expertise to analyze and interpret large and complex datasets.
Q. What are applications of data science?
Answer: The applications of data science are:
1) Image recognition and speech recognition.
2) Used for internet search in search engines like google, yahoo, bing, etc
3) Used in transport industry to create self driving cars.
4) Used in recommendations systems of amazon, netflix,etc. for better user experience and personalized recommendations.
Q. Explain different types of data
1) Structured
2) Unstructured
3) Semistructured
Answer:
Data: It is a set of raw facts such as descriptions, observations and numbers that needs to be processed to make it meaningful.
Structured Data:
- It is the type of data that is well organized.
- It is easy to be stored in tables within database or excel files.
- They are easy to search since they reside in fixed field within a record.
- PostgresSQL is a tool for structured data.
- Examples: Phone numbers, zip codes, etc.
Unstructured Data:
- It is the type of data that is not organized in a pre defined manner.
- It's content is varying and context specific.
- It may also be stored within non relational database like NoSQL.
- It is challenging to process this type of data/
- Example:
Human generated data like social media posts, regular email message or text message, data from photo, video or audio.
Machine generated data like: Weather data, military movements, digital surveillance photos and videos.
Semi- Structured Data:
- It is a data type that contains semantic tags, but is not structured like typical relational databases.
- It has internal tags to identify seperate data elements.
- Example: XML markup language has human and machine-readable format, JSON consists of ordered value lists and name/value pairs, linkedin data of business users sharing job titles, skills and more etc.
Q. What are different types of format
Answer: The different types of data formats are:
1) Integers
2) Floats
3) Text Data : (strings, numbers or characters)
4) Dense numerical arrays : (large arrays)
5) Compressed or archived format.
6) CSV format: stores data in tabular form
7) HTML files
8) JSON files : javascript object notation
9) XML files: Extensible markup language.
10) Tar Files: It archives multiple files to share them over internet.
11) Gzip files: It has GZ extension , they are made using standard GNU zip compression algorithm.
12) Zip files: It is archive format which makes it easy to send and backup large files or groups of files.
13) Image Files: These are standardized means of organizing and storing digital images. formats include jpeg, gif, psd , svg , pdf.
Q. What are tools of data scientists.
Answer: Following are different tools of a data scientist:
1) Python programming: It provides rich set of libraries for machine learning, data processing and analysis. It also allows us to plot many types of visualization graphs.
2) R Programming: It provides a scalable software environment for statistical analysis.
3) SAS (Statistical Analysis System): SAS is used to analyze data using SAS programming language and perform statistical modeling. Mainly used for integrating data from multiple sources and generating statistical results.
4) Tableau Public: It is a data visualization software that is packed with powerful graphics to make interactive visualizations. It also has drag and drop features and easily available menus.
5) Microsoft Excel: It represents data in simple way using rows and columns and comes with various formulae and filters for data science. It can handle complex numerical calculations and generate pivot tables and display graphics.
6) Rapid Miner: It provides suitable environment for data preparation. It can track data in real time and perform high end analytics.
7) Apache Spark: It has remarkably high speed when dealing with large data sets. It also has Spark SQL, Spark Streaming, and machine learning library.
8) Knime: It is a data analysis and reporting software. It has practically no limits on input data fed into the system.
9) Apache Flink: It can quickly carry out real time data analysis, and reduces complexity.
Q. Measure of central tendency
Q. Types of statistical analysis
1) Descriptive
2) Inferential
About Abhishek Dhamdhere
Qna Library Is a Free Online Library of questions and answers where we want to provide all the solutions to problems that students are facing in their studies. Right now we are serving students from maharashtra state board by providing notes or exercise solutions for various academic subjects
No comments:
Post a Comment