Title context-based personal data protection in smart city
Research Background, Objectives and Significance. Research on Concept and Connotation of Smart City. Characteristics of Smart City. Importance of Personal Data Classification. A Classified Personal Data Protection Architecture. Services in Smart City.
Рубрика | Программирование, компьютеры и кибернетика |
Вид | дипломная работа |
Язык | английский |
Дата добавления | 23.09.2018 |
Размер файла | 964,6 K |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Chapter 4 Personal Data Security in Smart City
4.1 Privacy Concerns in Smart City
As analyzed before, the framework of a smart city can be divided into four layers technically, there are: physical perceptual layer, network communication layer, data layer and application layer. The privacy threats in each layer are identified in following:
When collecting data in perceptual layer, the transmission is basically realized via wireless network. If there are no effective protection measures for such signals exposed in public places, it is easy to be illegally monitored, stolen and interfered. Sensor nodes are usually deployed in unsupervised environment. The computing resources of the nodes are very limited, and it is difficult to use complex security strategy. And the processing capability is weak and the logic is usually relatively simple. Therefore, sensor nodes face security problems such as, the network connections may be intermittent; sensor nodes may be occupied and stolen; data may be forged; and traditional security strategy may be difficult to implement [50].
The network infrastructure of smart cities requires secure and reliable communication, which includes a variety of network forms, such as cellular, Internet, satellite networks, municipal and enterprise intranets. In this case, the vulnerabilities of these networks will also be introduced into the network infrastructure of a smart city. The more prominent security risks in smart city include: when a large number of devices access the network, the authentication and key generation strategy is very challenging; data transmission and information exchange between heterogeneous networks may become vulnerabilities middleman attack or other types of attacks are difficult to avoid.
Since data computing and storage in smart city usually happen over cloud, which has very high concentration of resources, so it is faced with the problem of data centralization security, data reliability and the security of the cloud platform itself. The cloud platform may become more vulnerable to attack because of its high concentration. In addition, the resource scheduling and operation management of the cloud platform depends very much on virtualization technology. Security loopholes and attacks aiming at it, such as covert channels and virtual machine escape, will also threaten the security of the cloud platform [39].
The integration of data and services in various industries and fields makes it more important for data classification management and confidentiality protection, and it may result in security risks such as interconnected interface security, user privacy disclosure, sensitive data theft, service availability, cross-domain access and authentication, etc. There are a lot of user data in smart city system. Once this information being leaked with unauthorized access, it will cause serious security threats to the public. Smart cities have a wide range of data types. Data mining on integrated data may produce new data with greater value. The leakage of these data may even result in significant political and economic losses. A large number of RFID and other devices in smart city system will bring new security problems to the traditional identity authentication system, especially in the scene of cross-domain access and authentication.
(4) PC or mobile terminal is the main way for users to access smart city services. To user themselves, weak sense of security or incorrect configuration may cause persona data leakage directly. The security risks may also exist in mobile processor, operation platform, or in software itself.
4.2 Types of Attacks in Smart City
In smart city, due to the massive volume of interconnected sensors, servers, databases and other infrastructures, the attack surface is extended. Violations of data security can cause compromise across the system and can easily transmit infections between systems. It is important to recognize the attack layer of smart city before we start the design of personal data protection architecture.
Apart from fraud from human aspect, the main types of attacks can be divided into four categories targeting on different layers in smart city architecture, and they are: information theft, wireless interference, cipher leakage and DoS attack [8].
Information Theft
Topology Probing
Given the nature of wireless communication, the attack nodes can usually monitor packets within certain range and analyze the basic information of packets, such as device address, network identification number, etc. With these information, the attack node can restore part of the network topology. The structure of some networks and the importance of nodes in the network can be analyzed from the topological structure, thus the corresponding destructive measures are initiated.
Replay Attack
Attack nodes can replay attacks by intercepting packets. Especially in the smart home network, replay attacks on command information may cause illegal actions, which will seriously affect the people's privacy and threaten the security of our property.
b. Router Forgery
To forge a legal router node into the network is way too serious than forging a node into network. Since a forged router can generate deception which could mislead node data into the forged router (e.g. Wormhole attack and Sybil attack), and leads to information disclosure. Especially in cluster structured sensor networks, the higher the level of forgery routers, the greater the impact.
Wireless Jamming
Wireless jamming is almost a mishap of wireless sensor network. When the attack node occupies the wireless resource of the communication band, the network will be congested, especially when the attack node interferes with the sink node, it will affect the normal operation of the whole network. Therefore, as long as the attack node find the sink node through network topology analysis or traffic analysis, it can lock the attack on the sink node. If there are multiple sink nodes in WSN, and the traffic is dispersed in them, the impact of the attack mode can be minimized.
Cipher Leakage
In open wireless sensor networks, the management and distribution of cipher is always a difficult problem. It is easy to cause the cipher leakage when security is not guaranteed. For instance, in Zigbee2007, when a node does not have any cipher code, the coordinator or Trust Center will send the cipher code to the node in plaintext. Once the attack node captures the information during monitoring, it can decrypt the information transmitted in the network. It can even spoof encrypted data to extract more information from the attacked network. As a result, the security of the entire network will be compromised once cipher code is compromised.
DoS Attack
DoS attack is an easily caused problem in WSN. There are several forms of DoS attacks on sensor networks: Resource Depletion (RD) attack, congestion attack, wrong routing of packets, and so on. DoS attacks on single terminal nodes do not cause serious problems to the entire WSN network since the number of terminal nodes deployed is sufficient to ignore this impact. If the DoS attack happens on the sink node, it will cause severe security problem to whole network and the performance of the network will be greatly reduced. Thus, a sink node with strong processing capability or deploying multiple sink nodes can minimize the impact.
4.3 Personal Data Classification in Smart City
In 19th century, the birth of camera technology breeds the concept of portrait right. Similarly, continuous development of technology and the construction of smart city world widely leads to people's awareness of personal data protection. The main goal of smart city construction is to optimize city management and improve citizen's living experience. Driven by this, for example, government may use citizen's location data to analyze real-time road condition and visitors' flowrate, therefore to better arrange transport resources. Also, smart home devices may record your personal data and habits to self-command or make recommendation like many other websites do. Different services may require specific types of data among urban database. At the same time, citizens engaged in city life are exposing their personal data more or less, either consciously or unconsciously.
4.3.1 Importance of Personal Data Classification
Data Classification is a process which is used to optimize data security and data protection programs, procedures and processes. Data needs to be classified based on its sensitivity type and the level of impact to the related entities if that data is destroyed, changed or disclosed. The concept first appeared in the UK in the late 19th Century and formed part of the Official Secrets Act 1889 entitled “An Act to Prevent the Disclosure of Official Documents and Information”. Yet, despite it having been around for over 126 years, data classification is not well-conducted in many organizations except government bodies or large institutions.
Ponemon Institute is an outstanding and authoritative research center committed in privacy, data protection and information security policy. In a data security report released by Ponemon Institute, which investigating 1587 IT and security practitioner in 16 countries and regions around the world, it revealed some interesting and valuable discoveries: data security managers believe that the unidentifiable location of sensitive and confidential data is more troubling than hacker or malware. Sensitive data classification is selected as the top one technique both in unstructured data protection and structured data protection. To conclude, Data classification is the first consideration of deployment direction when planning sensitive data protection schemes. The practical application of data classification technology needs to conquered several technical difficulties to be more accurate and better performed [40]. In this respect, the traditional keywords filter and regular expression technology are difficult to be fully qualified, and new technical means are needed.
The General Data Protection Regulation, also known as GDPR, is a regulation in EU law on data protection and privacy for all individuals within the European Union. It addresses the export of personal data outside the EU. The GDPR aims primarily to give control back to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU. It will become the most important and authoritative data protection basis in Europe. At the same time, according to the survey conducted by PwC, 92% of American companies also believe that GDPR will become the most important data protection measure. Data classification is one of the main points proposed by this regulation. It helps to understand and sort out what is important, which will also reduce the storage costs, and this is another key consideration under GDPR [41]. Furthermore, this regulation guarantees the safety of various types of personal data, including basic identity information (name, address, ID card number, etc.), network data (location, IP address, Cookie data and RFID tag, etc.), healthcare data, biometric data (fingerprints, irises, etc.), as well as racial data, political views and sexual orientation. The implementation of GDPR has milestone significance for the protection of personal data [52].
Personal data is becoming the most valuable assets since business has endowed it with great potential to develop personalized services. In spite that people already realized data security shall be the most emergency issue with explosive development of data science and data technologies, the volume of data leakages is still showing an upward tendency during the past decade. Depending on Global Data Leakage Report, H1 in 2017, personal data leakage has declined in percentage when compared with the same period of 2016, but it still accounts for 65.8% in total amount, exceeding the sum of billing information, state secret and trade secret. This report also makes classification in data leakages industries and analyzes the distributions. According to statistics, personal data regarding to banking and finance, healthcare are under greatest danger. Other situations vary among different industries and data type [24].
The personal data protection in international legislation has strict provisions on prohibition of the discourse, dissemination and use of special types of personal data. Such as the EU Data Protection Directive (1995), Article 8, “The processing of special categories of data”. Early in 2010, National Institute of Standards and Technology (NIST) in the United States issued the SP 800-122 standard to guide the protection of personal identifiable information. In this document, personal data was classified into three grades (low, moderate and high) in accordance with the level of adverse effects caused by data disclosure [37]. It is important to realize that certain types of personal data are more sensitive and private than others. Especially with the advent of big data era and the popularization of related technologies, it is forecasted by Intel that the global data volume would reach 44 ZB in 2020. In this circumstance, to distinguish the types of data and their sensitive levels, and then deploy corresponding levels of protective measures, is an efficient and cost-effective approach in data protection, especially personal data protection.
4.3.2 Types of Personal Data Involved in Smart City
To better understand personal data involved in different services/scenarios in smart city, this part will introduce some proposed privacy taxonomies to help analyzing. Finn et al. [18] proposed seven types of privacy, they are: privacy of person, thoughts, behavior, communication, association, data and image, and location.
However, the categories are not classified clearly enough in smart city content since some distinctions are unnecessary and data types and data scenarios in smart city are more complex and interconnected. For example, the category “communication” and category “association” that proposed by Finn et al. cannot be separated since there are always some form of association when people are communicating with others. Based on the proposed taxonomies of personal privacy and the smart city scenarios, types of personal data involved are illustrated as follows:
Personal Identifiable Data
Identity data can be used, individually or in combination, to identify a particular user. It mainly includes basic personal information such as name, age, address, ID number, which can identify people directly. Also, it includes users' contacts and other virtual identity information that can help to identify someone indirectly.
2 Personal Statement Data of Body and Mind
The state of body and mind mainly contains people's physical characteristics such as biometrics, genome, blood type, and mental states like emotions, political opinions, sex preference. It also includes people's financial status, marital status, health status, education background, work experience and so on.
3 Social Data
Social media has become one of the most important personal data source since almost everyone is engaged in daily life. A person's social data contains his/her social interaction, such as the following and followers, posts and so on. It can be used to analyze people's behavior or status, or to predict people's reference. Besides, many other types of privacy may be derived through social data analysis. For example, some institutions tried to predict public health issue or next financial crisis by analyzing social network data, and the result turned out it was feasible indeed.
4 Network Data
Network data regarding to privacy includes all information involved when people are using Internet service in smart city. The main contents include user consumption information, service ordering relationship, terminal information, access information (such as IP address), location and network behavior records (such as online shopping records, search history or cookie). All these data can be collected and tracked for action analysis, purchase pattern analysis and so on, therefore reveal other types of privacy.
5 Location Data
Citizen's location data may be collected for smart transportation, spot recommendation and many other smart city services since it does not only refer to location geographically, but also reveal when it is visited. In smart city, the location data can be extended as any spatio-temporal information, including the home, workplace, as well as other types of privacy such as purchase preference, social life, etc.
4.3.3 Personal Data Categories and Samples
In accordance with the value and security risk of personal data in various content, we can vary the degrees of protection, to claim different service requirements towards service providers. Three factors shall be considered when differentiation the level of protection [45]:
1. Whether these data can be directly identified to specific user;
2. How much are these data tied to user's life;
3. Whether user agrees the consent regarding to data publication
Taking into consideration above factors, we can draw a workflow to evaluate personal data categories.
Fig 6. Theoretical Flow Chart of Personal Data Evaluation
To have a better understanding on different personal data categories, we list the matrix about the above four categories of personal data on the basis of information classification matrix released by ISO27001[4].
Table 1. Classification Matrix of Personal Data
The restricted data is given the highest security level and therefore more secure privacy protection techniques are needed to ensure data safety. For example, to deploy access control, identity authentication, data encryption at the highest level. Confidential data may include home address, mailbox address, telephone and SMS records etc., the security level of deployed protections can be lower that the former. Or, some protection which is necessary in restricted data may not be required to protect confidential data. Lastly, for proprietary data, simple privacy protection technologies such as identity authentication can be adopted, so such information can only be accessed by authenticated users.
Before carrying out privacy protection, we should first define what kind of users' personal data belongs to and what attributes the data has. Certainly, this division is not static, and can be expanded according to different industries, different businesses and different security needs. Especially in smart city, personal data may not have clear distinct or various types of personal data exist in one file which is impractical to separate, it is suggested to apply the highest level of protection in accordance with the data categories in certain file. Meanwhile, the possible risks and the level of risks may also change depending on specific cases.
Hence, to deploy all-around personal data protection in smart city, it is important to first classify the collected personal data. A proposed personal data classification architecture will be introduced in next chapter.
Chapter 5. A Classified Personal Data Protection Architecture
After previous analysis on smart city architecture, along with the illustration of types and sensitive levels of personal data involved in smart city, the flow of personal data collection, transmission, process in smart city is presented, and the context and sensitive level of each category of personal data are described with examples. Under these prerequisites, a classified personal data protection architecture is designed and proposed. It covers the processes from network data collection, data classification, data computing and storage, to data application in services.
5.1 Overall Description
In this architecture, security protection is divided into four levels that in accordance with data flow in smart city architecture.
Fig. 7 Component Diagram of Context-based personal data protection Architecture
The architecture principle can be summed up as follows:
To safeguard personal data from the first beginning, the collected data shall be encrypted once being gathered; Then the encrypted data will be transmitted and before data classification, it is necessary to authenticate the encrypted data from various transmission and perceptual networks, and filters out the irregular data. Then the classification workflow will start and the output is classified personal data labeling with different sensitive level. The classified data then need to go through gateway filter to eliminate error. After that, the virtual storage space of the data will be determined according to the target address of the data; The classified data will be stored in different cloud according to their labeled sensitive level. Attribute-based encryption, and access control will also be applied to safeguard the security of personal data in cloud. Meanwhile, the data processing module performs various analysis operations on data loaded with virtual machines, including parsing, checking, correcting, integrating, recognition and so on. Data processing modules will work under security protection module to avoid personal data revealing. For example, behavior monitoring is essential since various data operations are conducted at different levels by different data processors for diverse tasks; After data being fused in application or service, audition and leak prevention are main policies that need to be implemented. On the other hand, to secure the application itself is another approach to protecting personal data involved.
There are two ways to implement the proposed architecture: a). Develop into corresponding security components and implant into smart city architecture to implement security protection. This mode requires a separate security component for each data processing module of each smart city application; b). develop a common security sub-module system and configurate to the smart city architecture. This mode separates from each data processing module, providing unified protection for personal data in various virtual machines during whole process.
5.2 Hierarchical Elaboration
As we mentioned before, the security system in smart city can be regarded as three dimensional, in which X-axis refers to security mechanism, Y-axis is based on OSI (Open System Interconnect) model and Z-axis represents different security service. In the proposed architecture, the seven layers in OSI model are replaced with four layers of data operation flow. And each layer will be explained elaborately and regarding on its operation, security mechanism and security services.
Data Collection
Previously, the main resources of personal data are statistical data from national statistical department, or data collected by enterprises and other organizations. With the continuous development of Internet in recent years, network has become an important data and information source, such as web data recorded by search engine, social network data in Twitter.
In this level, raw information collected from sensors is stored for further processing. Some of the formats in which heterogeneous data are collected are csv, tweets, database schemas and text messages. The collected formats are then processed using semantic web technologies in order to convert them into a common format. To secure data security in transmission, it is necessary to encrypt data before transmission, or the data will be transmitted in plaintext, which is highly risky. There are many mature encryption methods and the best practice in this circumstance is to use homomorphic encryption so that data can be further processed without decryption.
This involves rich interplay between many disciplines, such as, signal processing, hardware design, supply-chain logistics, privacy rights, and cryptography [5]. Security measures, such as trusted platform module (TPM)for RFID privacy, key management, quantum cryptography need be properly applied to ensure the reliable and secure, authentication, integrity and confidentiality of data and metadata.
In this layer, security mechanism is to ensure the security of perceptual devices and actuators. The perceptual devices collect data from outside environment and the actuators are responsible for receiving these data and reacting to the perceived results. Therefore, ensuring their security is the ultimate guarantee for personal data security. In order to do so, security services such as log collection, environment monitoring and vulnerability scanning are basic and necessary modules to be deployed in this layer [32].
Data Classification
The main task in this data classification layer is to conduct personal data classification based on its context. The encrypted data from various transmission and perceptual networks would be authenticated first before classification to filter out the irregular data. As shown in Fig 7, the process of personal data classification can be divided into four steps. The specific operations and involving techniques are explained in following.
Step 1. Personal data recognition
The recognition of personal data is to sort out all data of the target system and extract sensitive data from it. This step is to determine whether personal data exists in certain file. The proposed techniques to conduct this task are: regular expression matching, name-entity recognition.
Regular Expression Matching. It is the most commonly used method to extract certain data type. It is easily to locate personal data such as name, email address, phone number, ID number etc. For instance:
var reg = ^\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$ for email address recognition;
var reg ="^(1\\d{2}|2[0-4]\\d|25[0-5]|[1-9]\\d|[1-9])\\."+"(1\\d{2}|2[0-4]\\d|25[0-5]|[1-9]\\d|\\d)\\."+"(1\\d{2}|2[0-4]\\d|25[0-5]|[1-9]\\d|\\d)\\."+"(1\\d{2}|2[0-4]\\d|25[0-5]|[1-9]\\d|\\d)$" to extract iPv4 IP address
smart city personal data
Regular expression matching is easy to set up, while it has certain limitations. The name formats differ among each country, therefore different expression need to deployed for accuracy. It is the same situation in other circumstances like cellphone number matching and ID matching. As an alternative, keyword matching or keyword pairs matching can help to improve this problem. Regular expression matching and keyword matching are good at text literal, while other options are needed if there is a semantic analysis demand.
b. Named-entity Recognition. Stanford Named Entity Recognizer (NER) is one of the accomplishments of the Stanford University Natural Language Research group. It is a Java-implemented named entity recognition program, which can tag the entities in the text by class, such as names of people, company names, regions, genes, and proteins. Stanford NER is based on a trained model with seven classes of identifying attributes: time, location, organization, person, money, percent, date. In comparison, another way to implement name-entity recognition is using the NLTK package in Python. It is a natural language toolkit implemented by the University of Pennsylvania using the Python language, which collects a wide range of open datasets, models that provide a comprehensive, easy-to-use interface that covers POS-tag (Part-of-speech tag), named entity recognition, syntactic parse, and other aspects of NLP functions [47].
By contrast, the named-entity recognition in NLTK annotates the entities like name, place, organization, but it also marks the syntactic opponents such as predicate or object, which will cause the redundancy of the output and requires further cleansing and screening. While Stanford NER can clearly label the seven types of entities without redundant words. However, because NER is developed based on Java, there may be certain bugs due to jar package or path problem when implementing it in other languages.
Beyond that, other solutions, such as DOM parsing, are also available depending on different data format and recognition requirements.
Step 2. Preprocessing
Once personal data is detected and recognized, the text will be preprocessed for further classification. Preprocessing is an important and necessary step to improve the classification accuracy. And if it is the content of a web page, we should first remove Html Tag first by using lxml, html5lib in Python. The necessary approaches in this step is stop-words elimination and stemming. Both of them can be realized by Python NLTK. Besides, it is also optional to do remove the punctuation marks, special characteristics or to check the spelling.
Step 3. Vector space modeling
After preprocessing, data then will be dispatched for vector space modeling. After reading, people can produce vague knowledge from the content according to their own understanding. While the computer does not easily "read" the data semantically since it only understands 0 and 1 in fundamental. Thus, the text should be convert into certain format that computer can recognize for further context-based classification. According to the "Bayesian Hypothesis", it is assumed that word or words that make up text are independent of each other in determining the text category, then this set of words in the text can be used to represent the text itself.
Vector space modeling is the main method adopted to represent the text. The basic idea of vector space modeling is to represent text by vectors: W1, W2, W3, ..., Wn, where Wi is the weight of the ith feature. Generally, a single word performs better than words pairs as a feature. The first step is to segment the text and represent the text by using these feature words as dimensions of the vector. Then is to calculate the feature weights by TF-IDF algorithm. After calculation, the vector space modeling is basically completed.
Step 4. Go to classifier
After the data being transformed into vector space model, it can be categorized by the already trained classifier. And the output of this step is personal data labeled with different category numbers. Basically, the data classification is completed for now.
To conclude, there are two layers that constitute the data classification process. The first layer uses the high-speed regular expression matching and named-entity recognition to filter. And the second layer adopts SVM based text classifier to implement context-based personal data classification.
To better understand the classification principle, the flowchart of building the classifier is displayed and explained following.
Fig 8. Steps of Classifier Construction
As we can see, the original datasets include a training set containing data with known categories which is labeled manually, and a test set which is used to evaluate the classification accuracy. Here we assume that there are n categories all together.
Preprocessing, as we mentioned above, helps to reduce words with duplicated meaning and remove meaningless word for feature selection.
Feature selection is an essential step because it can figure out the representative words of each data category. Most popular approaches are Chi Square, KL divergence, etc. The selection of methods and the performance should be considered in practice, because apart from the classification algorithm, the algorithm of feature selection can also lead to huge difference towards classification results. Here we assume that for each category, we choose m words features. For example, in category “finance data”, possible features might be “bank”, “credit”, “dollar” etc. Therefore, a dictionary containing m*n words(features) is built now.
Then the text data in train set will represented by vector space model by TF-IDF algorithm. It is worth noting that, when the volume of words that constitute the text is quite large, the dimensions of the vector space representing the text will also increase and can even reach tens of thousands of dimensions in smart city environment. In this circumstance, dimension reduction is necessary for improvement of efficiency and computing speed. Another reason is, all words have different significance in classification. General and common vocabulary makes smaller contribution than those words which only has high frequency in specific text. In order to improve classification accuracy, we need to remove the words with less expressive ability and select a set of features for each class. After vector space modeling, a featured vector that can represent the text is generated.
Then we will use these vectors to train the classification algorithm. This is the key process in data classification. There are many classification algorithms based on vector space model, such as Support Vector Machine (SVM), Maximum Entropy method, K-nearest Neighbor method(K-NN) and Bayesian method, etc. The following introduces two representative classification algorithms:
The Bayesian algorithm. The basic idea of Bayesian algorithm is to calculate the probability that the text belongs to certain class. The probability that the text belongs to a class is equal to the comprehensive expression of the probability of each word in text belonging to that class. The classification principle is to use the priori probability of a text to calculate the posterior probability. At last, the class with the maximum posterior probability is chosen as the class to which the text belongs.
K-nearest Neighbor method. The principle of this algorithm is that, after a new text is given, consider the K texts in the training text set with the nearest distance (most similar) from the new text, and determine the class that the new text belongs to according to the class that majority of the K texts belong to. The input consists of the K closest training examples in the feature space, and the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its K nearest neighbors. Usually, we set an initial value first, then adjust the K value according to the result of the experimental test.
Last but not least, the test set should be used to evaluate the classifier and check the accuracy.
Data Storage and Processing
After the implementation of data classification based on context, collected personal data will then be labeled into different categories as we discussed in previous chapters. Different data categories require various security levels. The storage layer also contains two parts, and they are private cloud storage and public cloud storage.
Fig.9 Loading and Storage of Classified Personal Data
Personal data which is classified into higher security level (proprietary, confidential and restricted) is suggested to be stored in private cloud, while it can also be stored in public cloud in case of some special circumstances, in condition that the data is highly encrypted. The data which is classified with lower security level (public), need to pass through the security gateway filtering before dispatching to public cloud to recheck whether there are still sensitive data which is not recognized. The containing sensitive data, if there is, will be intercepted and stored in private cloud, and the rest will be stored directly in the public cloud. The implementation of data storage is based on hybrid cloud architecture. In rare cases, if the private cloud is not available, user intranet could be the alternative as long as it has enough security protection.
After the data being assigned into different clouds, the original encryption should be replaced with attribute-based encryption(ABE). Under attribute-based encryption, data will be marked with different attributes and then be encrypted according to its attributes. This kind of encryption is especially fit for the construction demand of various smart city services and applications. After encryption, an attribute-based signature scheme is provided to enforce the high-level protection for user's personal data by signing the encrypted data. Another option to ensure better security is to deploy attribute-based access control(ABAC) accordingly. It defines an access control paradigm whereby access rights are granted to users through the use of policies which combine attributes together. The policies can use any type of attributes such as user attributes, resource attributes, object, environment attributes or many other attributes from various systems in smart city. Compared with other types of access control, ABAC is considered as "next generation" authorization model because it provides dynamic, context-aware and risk-intelligent access control, allowing smart city flexibility in the implementations based on existing infrastructures [62].
Data processing is the most essential phase in smart city construction, since information with great value could be extracted from various data analysis operations. Some of the main operations are data cleansing, massive data aggregation, data mining, dynamic modeling of associated data, data vitalization, etc. Because the resources of data are heterogeneous and dynamic, even anonymous data can be processed to find out users' privacy by association analysis, clustering and some other data mining methods. The computation security shall be taken as the first priority. Equally, the basic infrastructure security and storage security also require comprehensive safeguards. Corresponding security services are: operation monitoring, audit logging, situation awareness and exception warning. The operation monitoring service conducts a). data processing and results monitoring, b). analysis of relevant data flow characteristics, c). assessment of possible operation and actions triggered by data, d). determine if the data exceeds the critical scout. Once the abnormal “hazard” data are found, the anomaly monitoring module must handle it in time. And the exception warning service shall be implemented to both the known "hazard" data stream and other uncertain “hazard” data. In addition, system shall send notification to relevant module or system and execute access control for other uncertain "hazard" data [49].
Data in Application
Vitalized data can be utilized in different applications for smart city construction. The inferred data can be utilized in many ways such as input/output, messaging, alerts or warnings. The security mechanism in this layer is mainly about the application itself, which can be specified into platform security, operation system security, software security and terminal security, web security.
The monitor of the use of sensitive data refers to the monitoring of data transfer, storage and use involved in application operation, development test, external transmission, detecting violations in time and proceeding according to set rules. Sensitive data monitor in application layer focuses on application monitor and terminal monitor.
Application monitoring is a process that ensures a software application processes and performs in an expected manner and scope. In this architecture, it refers to the access monitoring by combing the relation between traffic and business operation with sensitive data, and analyzing all the application traffic by traffic collection. Many IT companies such as Microsoft, Dell, IBM, Oracle have developed mature application monitoring tools. There are also some great open source tools like Nagios.
Terminal usage monitoring refers to the security management and monitoring of terminals. According to the subordinate relationship of the organization, department and user, the corresponding rights rules of the equipment and the stored data (permissions/supervision/prohibition, etc.) can be defined. If sensitive data violations occur, such as sensitive data in an insecure position, illegal printing of files containing sensitive information, illegally copying or transferring files containing sensitive data or using instant messaging, mail and other tools for data outbreaks, the risk can be reported [65]. If the end user tries to manipulate sensitive data in an unauthorized case, it triggers an alarm prompt, and operations that are seriously violated by the security policy can be interrupted directly.
Sensitive data audit is also an effective protection service after data being used. Audit refers to the discovery of leakage points. Leakage prevention refers to the discovery of the source of leakage in the case of data leakage and the prevention of further sensitive data diffusion. Audit of sensitive data means log collection and audit analysis of various behaviors of accessing sensitive data, it is a kind of inquiry forensics function, provided that the related system needs to log the user's data access behavior, including the management of sensitive data, data desensitization, sensitive data usage monitoring and sensitive data leakage logging [11]. This service can be realized by developed tools (e.g. IDERA SQL Compliance Manager), or to design a module that satisfying specific audit requirements.
5.3 Building Blocks for Security Mechanism and Security Service Based on PETs
The term Privacy Enhancing Technologies, also known as PETs, is defined by Dutch and Ontario Privacy Commissioners in 1995, and the importance of technology in protecting privacy is recognized since then. Nowadays, it is still regarded as one of the most important privacy protection spectrum by European Union since it covers the broader range of technologies that are designed for supporting privacy and data protection [15]. In this part, we will analyze key building blocks related to the security mechanisms and security services involved in proposed architecture based on PETs.
Data Minimization. The purpose of data minimization is to limit the volume and scope of collected personal data. It requires that the data you collect and process should not be used or kept longer than what is necessary for its original purpose, which means the data controller should limit the types and volumes of data collection. Additionally, data collected for one purpose cannot be repurposed without users' further consent. Data minimization is referenced in five separate sections in the GDPR, which reflects its absolute significance in personal data protection. In fact, it is impossible to be GDPR-compliant without implementing data minimization rules and processes at every step in the data lifecycle. In smart cities, sensors in perceptual network naturally gather more sensor data than required for the envisioned task, which is referred as “collateral data”. For example, cameras for specific tasks such as face recognition or traffic surveillance also record unrelated information like the posture and other entities in the scene. Therefore, data minimization is essential in data collection process to shape how organizations collect and process personal data [3].
Differential Privacy. Differential privacy is a more modern approach to limit the spiteful extraction. And it is referred by Apple in Worldwide Developer Conference (WWDC) in 2016 as new technology to implement privacy protection. It enables the realization of behavior pattern analysis of user group without knowing data of each individual. Differential privacy defines a very strict attack model, and gives a rigid mathematical proof with quantitative representation of the risk of data leakage. Even an attacker knows all the sensitive information except one specific record, it is still impossible to infer any sensitive information about that record. It provides a more semantic guarantee, no matter what kind of background knowledge and authority the attacker has, he or she is able to draw limited conclusions from the known data to, so, the risk of data leakage is small. Meanwhile, adding or deleting a record in a dataset does not affect the output. Differential privacy is based on data distortion technology, inserting random noise into the data with specific distribution, so as to achieve the purpose of data protection. However, the amount of noise is not related to the size of the dataset, but only related to the sensitivity of the whole dataset. The widely used methods of adding noise are Laplace method, random noise adding method, etc. [36]. Therefore, for large data sets, to add only a small amount of noise can lead to a high level of privacy protection. In this way, the data can be used to the maximum extent, along with the minimum leakage risk. Differential Privacy got considerable support in statistic database field at the beginning and now it is widely applied in other smart city services.
Attribute-based Encryption. Compared with identity-based encryption, which represents a user's identity with a unique descriptive string, attribute-based encryption uses a set of descriptive attribute strings instead. A user is only able to decrypt a message if his key's set of attributes matches the ciphertext's set of attributes, and therefore to realize fine-grained access control on encrypted data[22]. In smart cities, attribute-based encryption can be used to encrypt data for several groups of recipients who share common attributes, such as doctors, nurses, and patients [33]. A hierarchical attribute-based encryption scheme can also be applied in classified personal data protection in smart city [63].
Homomorphic Encryption. Once being collected, personal data should be encrypted accordingly. There are many developed encryption methods that can realize it. While the collected data still need to go through further transmission and analysis, ordinary encryption methods would cause inevitable time and resource consummation when decrypting data before analysis and encrypt it afterward. Homomorphic encryption is proposed to solve this issue. Partially homomorphic cryptosystems are computationally feasible today, but they only allow either addition (e.g., the Paillier cryptosystem) or multiplication operations (e.g., the ElGamal cryptosystem), but not both. Fully homomorphic cryptosystems do not have this restriction. With the help of fully homomorphic encryption, any operation on plaintext can also be realized on encrypted data without decryption, thus protects confidentiality during data processing. The additional value of homomorphic encryption in smart city is that, it enables third parties to process sensitive personal data without getting to know the input or output of computations [17].
Secret Sharing. The principle of secret sharing is to distribute secret information among reliable participants, and the secret can be repealed only when sufficient numbers of shares are combined. Typically, the secret is split into n shares, each participant receives one share, and a minimum of t shares is required to recover the secret. Thus, secret sharing provides both confidentiality (a single share does not allow recovery of the secret) and reliability (n-t shares can be lost without affecting
recovery). In smart cities, secret sharing can be used in solutions for data aggregation or distributed data storage. As an example, there are already studies on the application of secret sharing in smart city sensor networks [28] and in distributed data storage [46]
Secure Multi-party Computation. Secure multi-party computation is a cryptographic method to realize privacy preserving in joint computation of a group of distrust participants, without relying on a trusted third party. It ensures the independence of input and the correctness of computation, while not revealing each input value of other members involved in computing. In smart cities, secure multi-party computations can be used to design healthcare solutions, for example, to compute the results of genomic tests where both the patient's genome and the test sequence remain private [59].
Still, the PETs cover a wide range of privacy-protecting technologies and methods which are not mentioned in proposed architecture and in this paper, such as data anonymization, which contributes most to personal identifiable data protection, and zero-knowledge proofs, which gain great application in blockchain privacy protection. As well as data desensitization, which can deform and hide the sensitive information in personal data without changing its original data format, so as to ensure that the application can operate normally during the development and testing of desensitized data. As described before, the security system in smart city can be regarded as three-dimensional, with the gradual expansion of network and enrichment of security scope led by the continuous development and perfection of smart city, new services and mechanisms can be added and deployed in the axis according to specific design requirements.
5.4 Evaluation
The conclude and evaluate, the proposed security system architecture in smart city is designed on a hierarchical S-MIS structure, which also refers to the privacy principle in ISO/IEC 29100 and ISO/IEC 27000. In comparison, the traditional MIS is usually based on tree structure, and it is a synchronous mode with multi-level subsystems to realize the function design. The main features of the proposed architecture are hierarchical, data-oriented, and modularized. It has the following advantages:
The hierarchical design of the system can be supported, which allow the security system to cover every layer in the smart city architecture set by ITU;
The modularized design for different security mechanisms and services can be achieved;
Moreover, taking into consideration that, the whole environment and data-flow in smart city are very dynamic and the types of services and applications are expanding day by day, the proposition to extent the security system in smart city from layered structure into a three-dimensional structure is very constructive. The added values are:
Different combination sets of security mechanisms and security services as well as security technologies can be formed according to various protection demands and requirements. For example, within the three-dimensional space, the selected point (compute security, data layer, access control) indicates that, the compute security in data layer can be achieved through the realization of access control;
...Подобные документы
History of development. Building Automation System (BMS) and "smart house" systems. Multiroom: how it works and ways to establish. The price of smart house. Excursion to the most expensive smart house in the world. Smart House - friend of elders.
контрольная работа [26,8 K], добавлен 18.10.2011Определение программы управления корпоративными данными, ее цели и предпосылки внедрения. Обеспечение качества данных. Использование аналитических инструментов на базе технологий Big Data и Smart Data. Фреймворк управления корпоративными данными.
курсовая работа [913,0 K], добавлен 24.08.2017Проблемы оценки клиентской базы. Big Data, направления использования. Организация корпоративного хранилища данных. ER-модель для сайта оценки книг на РСУБД DB2. Облачные технологии, поддерживающие рост рынка Big Data в информационных технологиях.
презентация [3,9 M], добавлен 17.02.2016Data mining, developmental history of data mining and knowledge discovery. Technological elements and methods of data mining. Steps in knowledge discovery. Change and deviation detection. Related disciplines, information retrieval and text extraction.
доклад [25,3 K], добавлен 16.06.2012Сравнительная характеристика антивирусных программ. Фирма-разработчик и характеристика программы Eset Smart Security, форма продажи лицензий и структура модулей защиты информации. Назначение утилиты Eset SysInspector. Правила корректного обновления.
контрольная работа [28,8 K], добавлен 10.03.2011Классификация задач DataMining. Создание отчетов и итогов. Возможности Data Miner в Statistica. Задача классификации, кластеризации и регрессии. Средства анализа Statistica Data Miner. Суть задачи поиск ассоциативных правил. Анализ предикторов выживания.
курсовая работа [3,2 M], добавлен 19.05.2011Проект автоматизации системы энергосбережения на базе концепции Smart Grid. Анализ объекта управления, выбор оборудования. Реализация человеко-машинного интерфейса: центральный сервер, автоматизированные рабочие места, контроллеры активно-адаптивной сети.
курсовая работа [1,0 M], добавлен 02.10.2013A database is a store where information is kept in an organized way. Data structures consist of pointers, strings, arrays, stacks, static and dynamic data structures. A list is a set of data items stored in some order. Methods of construction of a trees.
топик [19,0 K], добавлен 29.06.2009Назначение и специфика работы вычислительной сети. Организация локально-вычислительной сети офисов Москва City. Глобальная компьютерная сеть. Топология вычислительной сети. Основные типы кабелей. Повторители и концентраторы. Планирование сети с хабом.
курсовая работа [228,5 K], добавлен 08.01.2016Описание функциональных возможностей технологии Data Mining как процессов обнаружения неизвестных данных. Изучение систем вывода ассоциативных правил и механизмов нейросетевых алгоритмов. Описание алгоритмов кластеризации и сфер применения Data Mining.
контрольная работа [208,4 K], добавлен 14.06.2013Совершенствование технологий записи и хранения данных. Специфика современных требований к переработке информационных данных. Концепция шаблонов, отражающих фрагменты многоаспектных взаимоотношений в данных в основе современной технологии Data Mining.
контрольная работа [565,6 K], добавлен 02.09.2010Основы для проведения кластеризации. Использование Data Mining как способа "обнаружения знаний в базах данных". Выбор алгоритмов кластеризации. Получение данных из хранилища базы данных дистанционного практикума. Кластеризация студентов и задач.
курсовая работа [728,4 K], добавлен 10.07.2017Общие сведения о предприятии. Организационная структура, функции и основные задачи Минздрава. Полномочия и ответственность министра здравоохранения. Исследование программной среды Personal Brain для формирования онтологической базы знаний для документов.
отчет по практике [169,7 K], добавлен 27.12.2013Історія виникнення комерційних додатків для комп'ютеризації повсякденних ділових операцій. Загальні відомості про сховища даних, їх основні характеристики. Класифікація сховищ інформації, компоненти їх архітектури, технології та засоби використання.
реферат [373,9 K], добавлен 10.09.2014American multinational corporation that designs and markets consumer electronics, computer software, and personal computers. Business Strategy Apple Inc. Markets and Distribution. Research and Development. Emerging products – AppleTV, iPad, Ping.
курсовая работа [679,3 K], добавлен 03.01.2012Основные виды сетевых атак на VIRTUAL PERSONAL NETWORK, особенности их проведения. Средства обеспечения безопасности VPN. Функциональные возможности технологии ViPNet(c) Custom, разработка и построение виртуальных защищенных сетей (VPN) на ее базе.
курсовая работа [176,0 K], добавлен 29.06.2011Роль информации в мире. Теоретические основы анализа Big Data. Задачи, решаемые методами Data Mining. Выбор способа кластеризации и деления объектов на группы. Выявление однородных по местоположению точек. Построение магического квадранта провайдеров.
дипломная работа [2,5 M], добавлен 01.07.2017Non-reference image quality measures. Blur as an important factor in its perception. Determination of the intensity of each segment. Research design, data collecting, image markup. Linear regression with known target variable. Comparing feature weights.
дипломная работа [934,5 K], добавлен 23.12.2015Анализ проблем, возникающих при применении методов и алгоритмов кластеризации. Основные алгоритмы разбиения на кластеры. Программа RapidMiner как среда для машинного обучения и анализа данных. Оценка качества кластеризации с помощью методов Data Mining.
курсовая работа [3,9 M], добавлен 22.10.2012Характеристика предприятия ЗАО "Талисман", анализ технического состояния его информационных систем и программного обеспечения, а также оценка уровня компьютерных технологий. Особенности использования антивируса ESET NOD32 Smart Security Business Edition.
отчет по практике [18,6 K], добавлен 15.11.2009