Wednesday, May 6, 2020

Ethics for Big Data and Analytics

Questions: 1. Big Data is just another form of Data Warehousing Discuss2. NoSQL is rapidly becoming one of the most popular platforms for Big Data. Explain how the platform works and what its key differences to the RDMBS are.3. What are the legal, regulatory and moral implications of deploying Big Data systems? Answers: 1. Big Data is just another form of Data Warehousing Big Data is the buzzword in the present world of technology. However, the essential meaning of Big Data is often not clear and is misinterpreted many times. Big Data is a concept and technology that provides solutions to hold, store and process the huge chunks of data which may be structured or unstructured in nature. Data Warehousing (DW) on the other hand, is the central repository of data that is made up of data collected from a number of different sources which may be internet or external in nature with respect to an organization (www.tcs.com, 2016). Technocrats often argue that Big Data is another form of Data Warehousing. Although, there are a number of similarities between the two, but the two concepts are completely different in terms of application, need and components. Both, Big Data and Data Warehousing work on the data that is collected from a number of sources; there are many differences between the two. Big Data solutions is demanded by the organizations to handle massive volumes of data and to come up with better decisions, more revenue, more profits and increased number of customers. Data Warehousing, on the other hand, is needed by the organizations to make informed decisions. It helps in making the data available with the organization more reliable and accessible (Inmon, 2016). The basic difference between Big Data and Data Warehousing is that the former is a technology and the latter is architecture. The preferred applications for Big Data include the areas wherein discovery of unexplored business questions is demanded. Big data technologies have the features that enable the organization to make lightening fast pattern recognitions that aid in this area. Data warehouses can also accomplish the task but are not optimal solutions for the same as they involve relational databases and corresponding SQL languages. Raw and unstructured data can also be easily stored with the help of Big Data technologies. Data warehouses mainly deal with the aggregated form of data. Big Data offers extensive environment for the explorations of structured and unstructured form of data. Whereas, in case of data warehousing, the task will take more time and will be a lot more complex. There are many areas where data warehouses score more than the Big Data. These include operations wherein high data quality is demanded by the organizations. Data warehouse has integrated built in solutions for making the clean, consistent and high quality data available for analytics. Data warehousing also wins the race when the organizations demand low latency and interactive reports in the task of OLAP. There are similar and alternative solutions under Big Data technologies as well for the task but currently data warehousing is way ahead in this area of work (Dull, 2013). Another key different between the two is that data warehousing is generally implemented in the structures of single relational database systems that act as the central repository, Big Data technologies such as Hadoop on the other hand are created to span multiple machines simultaneously. 2. NoSQL is rapidly becoming one of the most popular platforms for Big Data. Explain how the platform works and what its key differences to the RDMBS are. NoSQL is a database environment that is non-relational in nature and is used to handle and store huge volumes of data. It is often referred to as cloud database or Big Data database environment. There are a number of NoSQL database types such as graph database, key value store, column store and document database (www.planetcassandra.org, 2014). NoSQL databases work on large quantities of structured, un-structured and semi-structured data. The insertion of data in the NoSQL database environment is done without the need of a pre-defined schema. They automatically spread the data over a large number of servers that are available without requiring the need of the application to be aware of the same. They support automatic database replication in the events of a disaster or attack. They work by offering integrated caching capabilities for workloads that demand high throughput and low latency. Most of the NoSQL databases that are available are open source in nature that is, they are available for free of cost and can be easily installed. They can be downloaded, implemented and also scaled for better which involves a negligible cost for the organization. MongoDB. Neo4J, HBase, Riak and Cassandra are some of the popular NoSQL databases (www.mongodb.com, 2016). There are a number of differences between NoSQL databases and RDBMS. Elastic scaling is one of the prime demands out of databases in the present scenario. NoSQL databases have the ability to expand transparently with low cost commodity hardware which is not possible or rather extremely difficult in case of RDBMS. NoSQL databases are designed to handle the Big Data as they have specifically been designed for the task. RDBMS, on the other hand has been working towards achieving the same capability but there is still a long way to go looking at the constraints that come up with data volumes in an RDBMS environment. There is a bare minimum requirement of database administrators in case of NoSQL databases. Features such as automatic repair of the data, simpler data distribution and non-complex data models allow bringing down the requirement of DBA. On the other hand, RDBMS require skilled and highly experience database administrators. Absence of DBA in case of RDBMS can lead to significan t issues related to the databases as DBAs are involved in every single phase such as design, installation and handling of the database. NoSQL databases are extremely cost effective in nature as they make use of cheap commodity servers to handle, manage and store the data. The case is not the same in view of relational database systems as they rely on expensive storage and proprietary systems. Cost per gigabyte for NoSQL is way lesser than the same for RDBMS. Data model restrictions between NoSQL and relational database systems are also very different. NoSQL databases have extremely flexible data models as they allow virtual storage of any data structure in the form of data elements. Relational database systems on the other hand make change management as a big headache and may necessitate downtime or reduced service levels (Harrison, 2010). 3. What are the legal, regulatory and moral implications of deploying Big Data systems? Big Data deals with huge chunks of data on a daily basis. It is due to this reason that there are a number of legal, regulatory and moral implications that are involved with the deployment of Big Data Systems. Privacy is one of the biggest legal challenges that are associated with Big Data. It is essential for the organizations to maintain the privacy of all the information that the organization deals with. Notice/awareness is another legal principle that needs to fulfill at all times. It is necessary to fulfill the guideline of notification and mindfulness, the information subject from whom information will be gathered must be made mindful of the utilizations to which his or her own data will be put, and to whom such individual data will be revealed. There may be certain cases wherein the consumer may find it okay to use his/her personal information by a third party without his/her consent. However, the same must be enclosed in the form of a legal document so that there are no legal implications in future. Access and participation is another area that emerges as a major legal implication in the case of Big Data. Access to a particular piece of data cannot be granted to all the user types. Every data piece must have a clearly and legally defined list of users who are allowed to access the same. Another legal implication involved with Big Data is Do not target/Do not collect. The data that is collected from a number of different sources must not be used for target advertising without the presence of legally defined terms and conditions associated with the same (Navetta, 2016). There are also a number of regulatory implications that are associated with Big Data. The existing regulatory policies that are already defined for a particular organization may hamper the ability to collect data from a certain set of sources. If such policies are not closely paid attention to then there may be a number of serious regulatory implications on the organization. There may be charges of disclosure or identity theft levied on the organization if the legal and regulatory policies are not met by the same (Higgins, 2016). Ethical implications are also involved with the use and application of Big Data solutions. As per the ethics for Big Data and analytics, context of the data must be clearly defined and understood by one and all. There must be no confusions regarding the purpose of the data. Consent and choice are equally important for Big Data as it is important to understand what the associated parties are agreeing to. The data and the relationships that exist between the sets of data must also be reasonable in nature. There must also be substantiated sources of data that are used for the collection of Big Data. The sources must also be appropriate and authoritative. The access that is given to the data subject must also be ethically correct and defined accurately. One of the prime ethical implications also includes the accountability of the data and the way the mistakes and consequences are repaired and handled (Chessell, 2016). References Chessell, M. (2016). Ethics for big data and analytics. [online] Available at: https://www.ibmbigdatahub.com/sites/default/files/whitepapers_reports_file/TCG%20Study%20Report%20-%20Ethics%20for%20BD%26A.pdf [Accessed 8 Aug. 2016]. Dull, T. (2013). The 5 Ws: When should we use big data vs. data warehousing technologies?. [online] The Cyberista Says. Available at: https://tamaradull.com/2013/03/20/the-5-ws-when-should-we-use-big-data-vs-data-warehousing-technologies/ [Accessed 8 Aug. 2016]. Hadjigeorgiou, C. (2013). [online] Available at: https://static.ph.ed.ac.uk/dissertations/hpc-msc/2012-2013/RDBMS%20vs%20NoSQL%20-%20Performance%20and%20Scaling%20Comparison.pdf [Accessed 8 Aug. 2016]. Harrison, G. (2010). 10 things you should know about NoSQL databases - TechRepublic. [online] TechRepublic. Available at: https://www.techrepublic.com/blog/10-things/10-things-you-should-know-about-nosql-databases/ [Accessed 8 Aug. 2016]. Higgins, J. (2016). FTC Issues Regulatory Warning on Big Data Use. [online] Ecommercetimes.com. Available at: https://www.ecommercetimes.com/story/83004.html [Accessed 8 Aug. 2016]. Inmon, B. (2016). Big Data Implementation vs. Data Warehousing by Bill Inmon - BeyeNETWORK. [online] B-eye-network.com. Available at: https://www.b-eye-network.com/view/17017 [Accessed 8 Aug. 2016]. Navetta, D. (2016). Legal Implications of Big Data. [online] Available at: https://c.ymcdn.com/sites/www.issa.org/resource/resmgr/journalpdfs/feature0313.pdf [Accessed 8 Aug. 2016]. www.mongodb.com, (2016). NoSQL Databases Explained. [online] MongoDB. Available at: https://www.mongodb.com/nosql-explained [Accessed 8 Aug. 2016]. www.planetcassandra.org, (2014). NoSQL Databases Defined Explained. [online] Planet Cassandra. Available at: https://www.planetcassandra.org/what-is-nosql/ [Accessed 8 Aug. 2016]. www.tcs.com, (2016). Big Data for Data Warehousing. [online] Available at: https://www.tcs.com/SiteCollectionDocuments/White-Papers/BFS-Whitepaper-Big-Data-Warehousing-0313-1.pdf [Accessed 8 Aug. 2016].

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.