Web Intelligence and Big Data Unit 1 & 2 PYQ
Web Intelligence and Big Data B.TECH 8th sem (Unit 1 & 2)
Long Answer Type
1. (a) What do you mean by Web intelligence? How can we create web intelligent document and queries? Give suitable example.
Web intelligence refers to the use of artificial intelligence and machine learning techniques to extract insights and knowledge from the vast amounts of data present on the web. It involves the use of various tools and technologies to analyze, filter, and categorize data and to create intelligent documents and queries.
To create web intelligent documents, we can use techniques such as data mining, natural language processing, and machine learning algorithms to extract relevant information from web pages, social media feeds, and other sources. For example, a news aggregation website can use machine learning algorithms to analyze news articles and automatically categorize them into topics such as politics, sports, and entertainment.
To create web intelligent queries, we can use techniques such as semantic analysis and natural language processing to understand the user's intent and generate relevant search results.
Example : Voice Assistant such as Siri or Google Assistant can use natural language processing to understand user queries and provide relevant information and search results.
(b) Differentiate between flat data, text and graphics. Discuss various methods by which web intelligent report can be prepared.
- Flat data refers to data that is stored in a simple, two-dimensional table format. Text data refers to data that is stored in a textual format, while graphics refer to visual elements such as images and charts.
- Web intelligent reports can be prepared using various methods such as data mining, machine learning, and natural language processing.
- These reports can include interactive elements such as charts and graphs, and can be customized based on user preferences and data inputs.
2. (a) What are the three applications modes available for a document? Differentiate between data, reading modes and design modes.
- The three application modes available for a document are view, edit, and design modes. View mode is used to view a document, edit mode is used to make changes to a document, and design mode is used to create or modify the structure of a document.
- Data mode refers to the mode in which data is displayed, while reading mode refers to the mode in which a document is displayed for reading. Design mode refers to the mode in which the structure of a document is created or modified.
(b) How the various sources of data are synchronized? How multiple queries are created by using the data by merging dimensions? State various rules for merging the dimensions. Explain using examples.
- The various sources of data can be synchronized using data integration techniques such as ETL (extract, transform, load) and ELT (extract, load, transform).
- Multiple queries can be created by merging dimensions, which involves combining related attributes to create a more comprehensive view of the data.
- Rules for merging dimensions include ensuring that the attributes being merged are related, maintaining data accuracy, and avoiding redundancy.
- Example: Merging the "product" and "category" dimensions in an e-commerce database to create a more comprehensive view of sales data.
3 (a) What is the use of SAP bods?
- SAP BODS stands for SAP BusinessObjects Data Services, which is a data integration and ETL tool provided by SAP.
- SAP BODS helps organizations to have a unified and consistent view of their data from various sources.
- It helps organizations to extract data from various sources, transform it, and load it into various targets such as data warehouses, data marts, or operational data stores.
- It provides a graphical user interface that enables users to design, test, and deploy data integration jobs.
- It can handle both batch and real-time data integration scenarios.
- SAP BODS also provides data quality features such as data profiling, data cleansing, and data enrichment.
(b) Why Google, Facebook, Linkedin, eBay, Amazon etc did not use 'traditional databases for 'big data'. Give justification.
Google, Facebook, LinkedIn, eBay, Amazon, and other companies deal with an enormous amount of data every day, and traditional databases are not capable of handling such vast amounts of data. Here are some reasons why these companies do not use traditional databases for big data:
Scalability: Traditional databases have scalability limitations. They are designed to handle a limited amount of data, and they cannot scale horizontally. In contrast, big data technologies like Hadoop, Cassandra, and MongoDB can scale horizontally and handle petabytes of data.
Cost: Traditional databases are expensive to scale and maintain. Big data technologies are open-source and can be deployed on commodity hardware, reducing the overall cost.
Performance: Big data technologies are designed for distributed computing, which means that they can process data faster than traditional databases.
Flexibility: Big data technologies allow companies to store and analyze data in various formats, including structured, semi-structured, and unstructured data. Traditional databases are limited to structured data.
Real-time processing: Big data technologies like Apache Storm, Flink, and Spark Streaming provide real-time processing capabilities, whereas traditional databases cannot process data in real-time.
4 (a) What is Cognos safotware?
Cognos is a business intelligence software developed by IBM that provides a range of tools for reporting, analysis, scorecarding, and monitoring of events and metrics. It enables organizations to extract data from different sources, transform it into meaningful insights, and share it in various formats with different stakeholders. The software provides an integrated platform that includes modules for query, reporting, analysis, and planning.
Cognos software helps businesses to gain insights into their operations and make informed decisions based on the data. The software is designed to be user-friendly and provides a range of features that enable users to create custom reports, build dashboards, and analyze data in real-time. It can also be integrated with other IBM software such as SPSS, Watson, and other data management tools.
Cognos software is widely used in various industries such as healthcare, finance, manufacturing, and retail. The software enables organizations to monitor their performance and identify areas for improvement. With its advanced analytics and reporting capabilities, Cognos helps businesses to make informed decisions that drive growth and success.
(b) How the following is achieved in BIG DATA
(i) Massive parallelism (ii) Map-Reduce paradigm
(i) Massive parallelism in big data is achieved by distributing the data across multiple nodes in a cluster and processing it in parallel. This allows for faster processing of large datasets. The parallelism can be achieved through various techniques such as partitioning, sharding, and replication.
(ii) MapReduce paradigm is a programming model used to process large datasets in parallel across a distributed cluster. It consists of two stages - Map and Reduce. The Map stage processes data in parallel across the cluster and produces intermediate results, which are then aggregated by the Reduce stage. The MapReduce paradigm is used in many big data processing systems, such as Apache Hadoop and Spark.
Short Answe Type
(a) Differentiate between the cross-tab table and forms table with example.
A cross-tab table is a type of table that displays summary data by aggregating rows and columns of a primary table. It is used to summarize data and to make comparisons between different categories. For example, a cross-tab table can show the total sales of different products in different regions.
On the other hand, a forms table is used to store data that has a fixed format or structure. It consists of a set of fields that define the structure of the data. For example, a forms table can be used to store customer information like name, address, phone number, and email.
(b) Differentiate between Serializable Vs Writable methods with reference to MapReduce and explain why sort phase cannot be serializable? What are wrappers? Write a simple word count wrapper.
In MapReduce, the Serializable interface is used to serialize the data and pass it between the Map and Reduce phases. Serializable objects can be converted to a byte stream and then reconstructed into the original object.
Writable, on the other hand, is an interface that allows the data to be written to a stream or file in a way that can be read back and reconstructed into the original object. Writable objects are used in MapReduce to represent the input and output data for the Map and Reduce functions.
The sort phase cannot be serializable because it requires sorting the data based on the key values, and the keys themselves are not serializable. Therefore, the MapReduce framework must use a separate process to sort the data.
Wrappers are classes that are used to convert non-Writable objects into Writable objects that can be used in MapReduce. For example, a word count wrapper can be used to convert a text file into a sequence of key-value pairs, where the key is a word and the value is the count of that word in the file.
(c) Explain the process of Page Rank Searching algorithm in brief and enlist various methods.
The PageRank algorithm is a method used to measure the importance of web pages in search engine results. It was developed by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University.
The basic idea behind the PageRank algorithm is that the importance of a web page is proportional to the number and quality of links pointing to that page. The algorithm works by recursively calculating the PageRank of each page in a graph of web pages, based on the PageRank of the pages linking to it.
There are various methods used in the PageRank algorithm, including:
Random Surfer Model: This method assumes that a random surfer starts at a page and randomly jumps to other pages, following the links on those pages. The probability of the surfer being on a particular page is proportional to the PageRank of that page.
Matrix Formulation: This method represents the web graph as a matrix, where each row represents a page and each column represents a link from that page. The matrix is then multiplied by a vector of the initial PageRank values to get the new PageRank values.
Power Iteration: This method repeatedly multiplies the PageRank vector by the transition matrix until the PageRank values converge.
(d) What are combiners? Discuss the advantages and disadvantages of combiners.
Combiners are a type of optimization technique used in MapReduce and other distributed computing systems. They allow the merging of intermediate key-value pairs at the mapper node before sending them to the reducer node. The advantages of using combiners are:
Reduced network traffic: By combining intermediate values at the mapper node, less data is transmitted over the network, resulting in reduced network traffic.
Lowered memory requirements: Combining intermediate values at the mapper node reduces the number of key-value pairs that need to be stored in memory, thereby lowering memory requirements.
Improved performance: Combining intermediate values at the mapper node can improve the performance of the reducer phase by reducing the amount of data that needs to be processed.
The disadvantages of using combiners are:
Combiners may not always be applicable: In some cases, applying combiners may not be feasible due to the nature of the data or the operation being performed.
Combiners may increase processing time: Depending on the size of the intermediate values, applying combiners may actually increase processing time due to the additional overhead of the combiner function.
(e) How the Locality-Sensitive Hashing (LHS) can be carried out main memory?
Locality-Sensitive Hashing (LSH) is a technique used in data mining and machine learning to find similar items in large datasets. It can be carried out in main memory by partitioning the data into smaller subsets and performing LSH on each subset. This approach is known as partitioning LSH and can be used for datasets that are too large to fit into memory.
(f) Differentiate between the Column Oriented and Row Oriented data storage mechanism with suitable example.
Column-oriented and row-oriented are two different data storage mechanisms used in database management systems. In column-oriented storage, data is stored column-wise, whereas in row-oriented storage, data is stored row-wise. A suitable example of column-oriented storage is Apache Cassandra, which is designed to handle large amounts of data across multiple commodity servers. A suitable example of row-oriented storage is MySQL, which is commonly used for web applications.
(g) What is Big data? Why Big-Data is required? Explain the three V's of Big Data Characteristics. When Big-Data becomes really a problem? Elaborate.
Big data refers to the large and complex datasets that cannot be easily processed or analyzed using traditional data processing methods. Big data is required because it allows organizations to gain insights and make better decisions based on large and diverse data sources. The three V's of big data characteristics are:
Volume: Refers to the large amount of data that needs to be processed and analyzed.
Velocity: Refers to the speed at which data is generated and needs to be processed in real-time.
Variety: Refers to the different types of data, including structured, semi-structured, and unstructured data, that need to be processed and analyzed.
Big data becomes really a problem when traditional data processing methods are unable to handle the volume, velocity, and variety of the data. This can result in slower processing times, increased costs, and missed opportunities for organizations to gain insights and make informed decisions.
Very Short Answer Type
(i) What is Web Intelligence?
Web Intelligence refers to the ability to gather and analyze data from various sources on the web, and use that data to inform decision-making and business intelligence. It involves using various technologies and tools to extract meaningful insights from data collected from the internet.
(ii) What is SAP BO?
SAP BO, or SAP BusinessObjects, is a suite of business intelligence (BI) tools and applications developed by SAP. It includes a range of tools for reporting, analytics, dashboarding, data exploration, and more. It is designed to help organizations make better decisions by providing them with the insights they need to understand their data.
(iii) What is design studio in SAP BO?
Design Studio is a tool in SAP BO that allows users to create interactive dashboards and reports using a drag-and-drop interface. It provides a range of design tools and widgets, and allows users to create custom visualizations and layouts.
(iv) What is SAP BW?
SAP BW, or SAP Business Warehouse, is a data warehousing and data modeling application developed by SAP. It is used to collect and consolidate data from various sources, and provide users with a single, integrated view of their data.
(v) What is BOXI?
BOXI, or Business Objects XI, is an older version of SAP BusinessObjects that was released in 2007. It includes a range of BI tools and applications, including Crystal Reports, Web Intelligence, and more.
(vi) What is the semantic layer?
The semantic layer in SAP BO is a layer of abstraction that sits between the database layer and the presentation layer. It allows users to define and manage their business objects, and provides a way to access and manipulate data in a consistent and meaningful way.
(vii) What is linked universe in Business Object?
A linked universe in SAP BO is a way of combining multiple universes into a single, integrated view. It allows users to create complex reports and dashboards that span multiple data sources, without having to manually combine the data themselves.
(viii) What is a business warehouse in SAP?
A business warehouse in SAP is a central repository for storing and analyzing data from various sources. It provides a way for organizations to consolidate their data into a single, integrated view, and allows users to analyze and report on that data.
(ix) What is the difference between BW and BI?
The main difference between BW and BI is that BW is a data warehousing and modeling application, while BI is a suite of tools and applications used for reporting, analytics, and business intelligence.
(x) Explain the Web Intelligence Architecture and clearly identify the functions of Database Layer, Semantic Layer and Presentation Layer.
The Web Intelligence Architecture consists of three layers:
Database Layer: This layer contains the data sources that are used to extract data for analysis. It can include various types of databases, such as SQL, Oracle, and SAP HANA.
Semantic Layer: This layer sits between the database layer and the presentation layer, and provides a layer of abstraction that makes it easier for users to access and manipulate the data. It includes the metadata and business logic that defines how the data is organized and accessed.
Presentation Layer: This layer is where users interact with the data and the reports and dashboards are created. It includes the tools and applications used to create, manage, and share reports and dashboards.
The functions of each layer are as follows:
Database Layer: This layer provides the raw data that is used for analysis.
Semantic Layer: This layer provides a way to organize and manipulate the data, making it easier for users to access and analyze.
Presentation Layer: This layer provides the tools and applications that allow users to create reports and dashboards, and share their insights with others.
Comments
Post a Comment