Faster Learning Control Schemes Can Help To Address Cybersecurity Economically

We have many computer “hosts” in our lives, from personal computers to printers to cell phones to smart refrigerators. On these hosts are so-called “vulnerabilities” such as buggy software and weak passwords. Vulnerabilities are both identifiable in scans and exploitable by malware or human hackers if they have “exploits” that can “see” the vulnerabilities.

Fortunately, patches or security updates can often eliminate vulnerabilities. Even if patches do not yet exist, the vulnerabilities can be addressed by removing access or hiding the hosts behind additional firewalls. If patches do exist they may require testing, to make sure that they do not harm the host performance and manual actions from system administrators. Yet, with large numbers of hosts and tens of thousands of vulnerabilities possible, which actions are cost justified?

The figure below (a) shows three hosts having different sets of vulnerabilities. In the situation depicted, the vulnerability of the  printer is being exploited (the star) causing an incident. In such an incident, an expensive investigation is launched and any users whose information is found to be breached in the printer’s buffer must be notified. All vulnerabilities are given severity ratings by the common vulnerability scoring system. Some vulnerabilities are so important they get names such as “Heartbleed” and logos.

The figure below (b) shows a common policy such that the “no action” Action 1 is used for hosts with only low or medium vulnerabilities (State 1 and State 2 respectively) and the “attempt to patch” (Action 2) is used in State 3 or State 4 (high or critical vulnerabilities on the host, respectively). A major problem for policy development is that there is little or no experience applying Action 1 in states 3 or 4 or Action 2 in states 1 or 2.

Figure 1: Hosts with vulnerabilities at different severity levels and (b) system states and actions. Image courtesy Theodore Allen.

Costs in cybersecurity include repair costs associated with efforts to patch or remediate vulnerabilities and incident investigation costs. Further, there are potentially direct costs relating to losses of information such as purchasing insurance for all the people whose information was lost. There could also be espionage costs as competitors gain information. Costs not relating to repair and investigation are difficult to quantify and might be rarely incurred. Therefore, in our work, we focus on direct costs so that our artificial intelligence minimizes only the observable costs in the formulation below.


How can an Artificial Intelligence recommend patching or remediation actions when it has little data for some combinations of states and actions? The answer is that one needs a “learning” control system which values experimentation. The so-called “Bayesian Reinforcement Learning” (BRL) methods consider all possible actions in all system states and anticipate not only the system moving to new physical states but also the random acquisition of new knowledge or “belief states”. Over time, the system moves with “intrinsic” or built-in uncertainty and “parametric” uncertainty is reduced through learning.

The figure below shows an imagined BRL intelligence having several models in mind but planning on its “future self” transitioning to new states and effectively eliminating some of the models. At the beginning, which models will be eliminated is not known but some will (almost surely) be. Artificial Intelligences that do not consider the benefits of future learning are conservative and not experimental in that they underestimate the benefits of operating the system.

Image courtesy Theodore Allen.

There are many BRL control schemes. In our work, we focus on methods with finite model scenarios. Benefits include simplicity and an easy way to address learning both from observing state transitions and from observing the rewards or losses incurred. Yet, it is important to consider large numbers of model scenarios so that the benefits of learning are not overestimated, i.e., it is not unrealistically easy to eliminate all models except one. One of our innovations is to use so-called “variation-reduction” techniques including “Latin Hyper-Cubes” (LHC) to spread out the model scenarios so that fewer can safely be used. Also, learning can be sped up by using multiple systems at-one-time being one type of exact multi-task learning as indicated by the figure below.


When Duff (2002) proposed what were perhaps the first control schemes that can perfectly minimize expected future costs while it is learning, the optimality proof was assumed and not given. We give the proof. The basic idea is illustrated below. An intelligence with multiple models in a world in which its state is known is equivalent to a single-model intelligence in a larger world with an unknown position. Then, the policy can be developed using Partially Observable Markov Decision Processes (POMDP), which were first solved optimally in 1973 by Smallwood and Sondik. POMDP methods are important in many types of Artificial Intelligences.

Image courtesy Theodore Allen.

In developing learning control systems, one issue is the computational speed to generate the learning control policy. Another issue is the speed at which the developed policy eliminates parametric uncertainty. Policy features that we propose and study include the LHC model scenario selection, reward-based learning, and exact multi-system learning. These features make it more difficult to train the control system. A question becomes, how can we measure the benefits in terms of faster learning? To answer this, we propose the Median Estimated Learning Time (MELT). The MELT relates the average median number of periods until the top model scenario reaches a threshold level of probability. The figure below shows three simulations leading to a MELT of 3. Sometimes, a good control system might have a high MELT because a desirable policy can be developed without resolving which model is accurate. In our article, we illustrate how our control system can likely save millions of dollars in maintenance costs at a major university.

Image courtesy Theodore Allen.

These findings are described in the article entitled Reward-based Monte Carlo-Bayesian reinforcement learning for cyber preventive maintenance, recently published in the journal Computers & Industrial Engineering. This work was conducted by Theodore T. Allen and Enhao Liu from The Ohio State University and Sayak Roychowdhury from the Indian Institute of Technology.

About The Author

Theodore T. Allen

Theodore Allen is an associate professor at Ohio State University in the Department of Integrated Systems Engineering. Theodore does research in Quality Assurance Engineering, Manufacturing Engineering and Industrial Engineering. Their current project is 'Voting Machine Allocation and Apportionment."

Speak Your Mind!


Boyle’s Law Formula & Equation

Boyle’s law, sometimes referred to as Mariotte’s law, is a mathematical law that describes the behavior of a sample of an ideal gas. Boyle’s law states that for a fixed amount of gas at a constant temperature, the volume of the gas is inversely proportional to the pressure of the gas. In other words, Boyle’s […]

NASA Successfully Tests New Mini-Nuclear Fission Reactor, With Applications For Space Exploration

The National Aeronautic and Space Administration (NASA) recently announced that it has completed initial tests of a new miniaturized nuclear power system intended for missions in deep space. The device is referred to as Kilopower and it may one day bring power to bases established on the moon or Mars. The Kilopower project cost about […]

How To Outgrow Your Native Neighbor As A Young Invasive Plant?

Invasive species are a major driver of global change and their spread leads to the degradation of many natural ecosystems worldwide. Invasive trees and woody shrubs are known as “ecosystem engineers”, as they can profoundly change ecosystems, sometimes up to a point where a return to the original state is impossible. We have just shown […]

Towing Icebergs To Provide Drinking Water To Regions Suffering From Drought

Some of the largest cities in the world are struggling with drought and a shortage of potable water. For instance, Cape Town South Africa has been facing an ongoing water crisis as a result of a years-long drought that threatened to push the city towards a “Day Zero” where the city would be forced to […]

The Hidden Side Of Cities: Using Aquifer Thermal Energy Storage For Energy Saving

Aquifer Thermal Energy Storage (ATES) systems contribute to reducing fossil energy consumption by seasonally storing heat in underground aquifers. Combined with a heat pump, these systems can provide more sustainable heating and cooling for buildings, and they can reduce the use of energy in bigger buildings by more than half. ATES is therefore important for […]

The Types Of Ecosystems

The types of ecosystems on Earth range from wet to dry, cold to hot. They include rainforest, grassland, savanna, tundra, desert, estuary, taiga, prairie, freshwater lake, river, ocean, swamp, boreal forest, etc. The Earth is a relatively large space that holds countless living and nonliving features. When we try to understand the Earth and what […]

The Wisdom Of Crowds: When Collective Intelligence Surpasses Individual Intelligence

Consider a set of firms producing comparable goods. Each firm aims to strategically choose the selling price of their products in the market in order to maximize their expected income. However the income of each firm depends not only on his own chosen price but also on the chosen price by the other competitors; in […]