Faster Learning Control Schemes Can Help To Address Cybersecurity Economically

We have many computer “hosts” in our lives, from personal computers to printers to cell phones to smart refrigerators. On these hosts are so-called “vulnerabilities” such as buggy software and weak passwords. Vulnerabilities are both identifiable in scans and exploitable by malware or human hackers if they have “exploits” that can “see” the vulnerabilities.

Fortunately, patches or security updates can often eliminate vulnerabilities. Even if patches do not yet exist, the vulnerabilities can be addressed by removing access or hiding the hosts behind additional firewalls. If patches do exist they may require testing, to make sure that they do not harm the host performance and manual actions from system administrators. Yet, with large numbers of hosts and tens of thousands of vulnerabilities possible, which actions are cost justified?


The figure below (a) shows three hosts having different sets of vulnerabilities. In the situation depicted, the vulnerability of the  printer is being exploited (the star) causing an incident. In such an incident, an expensive investigation is launched and any users whose information is found to be breached in the printer’s buffer must be notified. All vulnerabilities are given severity ratings by the common vulnerability scoring system. Some vulnerabilities are so important they get names such as “Heartbleed” and logos.

The figure below (b) shows a common policy such that the “no action” Action 1 is used for hosts with only low or medium vulnerabilities (State 1 and State 2 respectively) and the “attempt to patch” (Action 2) is used in State 3 or State 4 (high or critical vulnerabilities on the host, respectively). A major problem for policy development is that there is little or no experience applying Action 1 in states 3 or 4 or Action 2 in states 1 or 2.

Figure 1: Hosts with vulnerabilities at different severity levels and (b) system states and actions. Image courtesy Theodore Allen.

Costs in cybersecurity include repair costs associated with efforts to patch or remediate vulnerabilities and incident investigation costs. Further, there are potentially direct costs relating to losses of information such as purchasing insurance for all the people whose information was lost. There could also be espionage costs as competitors gain information. Costs not relating to repair and investigation are difficult to quantify and might be rarely incurred. Therefore, in our work, we focus on direct costs so that our artificial intelligence minimizes only the observable costs in the formulation below.



How can an Artificial Intelligence recommend patching or remediation actions when it has little data for some combinations of states and actions? The answer is that one needs a “learning” control system which values experimentation. The so-called “Bayesian Reinforcement Learning” (BRL) methods consider all possible actions in all system states and anticipate not only the system moving to new physical states but also the random acquisition of new knowledge or “belief states”. Over time, the system moves with “intrinsic” or built-in uncertainty and “parametric” uncertainty is reduced through learning.

The figure below shows an imagined BRL intelligence having several models in mind but planning on its “future self” transitioning to new states and effectively eliminating some of the models. At the beginning, which models will be eliminated is not known but some will (almost surely) be. Artificial Intelligences that do not consider the benefits of future learning are conservative and not experimental in that they underestimate the benefits of operating the system.

Image courtesy Theodore Allen.

There are many BRL control schemes. In our work, we focus on methods with finite model scenarios. Benefits include simplicity and an easy way to address learning both from observing state transitions and from observing the rewards or losses incurred. Yet, it is important to consider large numbers of model scenarios so that the benefits of learning are not overestimated, i.e., it is not unrealistically easy to eliminate all models except one. One of our innovations is to use so-called “variation-reduction” techniques including “Latin Hyper-Cubes” (LHC) to spread out the model scenarios so that fewer can safely be used. Also, learning can be sped up by using multiple systems at-one-time being one type of exact multi-task learning as indicated by the figure below.


When Duff (2002) proposed what were perhaps the first control schemes that can perfectly minimize expected future costs while it is learning, the optimality proof was assumed and not given. We give the proof. The basic idea is illustrated below. An intelligence with multiple models in a world in which its state is known is equivalent to a single-model intelligence in a larger world with an unknown position. Then, the policy can be developed using Partially Observable Markov Decision Processes (POMDP), which were first solved optimally in 1973 by Smallwood and Sondik. POMDP methods are important in many types of Artificial Intelligences.

Image courtesy Theodore Allen.

In developing learning control systems, one issue is the computational speed to generate the learning control policy. Another issue is the speed at which the developed policy eliminates parametric uncertainty. Policy features that we propose and study include the LHC model scenario selection, reward-based learning, and exact multi-system learning. These features make it more difficult to train the control system. A question becomes, how can we measure the benefits in terms of faster learning? To answer this, we propose the Median Estimated Learning Time (MELT). The MELT relates the average median number of periods until the top model scenario reaches a threshold level of probability. The figure below shows three simulations leading to a MELT of 3. Sometimes, a good control system might have a high MELT because a desirable policy can be developed without resolving which model is accurate. In our article, we illustrate how our control system can likely save millions of dollars in maintenance costs at a major university.

Image courtesy Theodore Allen.

These findings are described in the article entitled Reward-based Monte Carlo-Bayesian reinforcement learning for cyber preventive maintenance, recently published in the journal Computers & Industrial Engineering. This work was conducted by Theodore T. Allen and Enhao Liu from The Ohio State University and Sayak Roychowdhury from the Indian Institute of Technology.



Five Areas Of Research In Natural Product-Derived Drug Discovery

In the past few years, a considerable amount of drug discoveries related to natural products have decreased significantly. Big Pharma companies […]

How (And Why) Different Plant Species Are Distributed In Different Spots Around The Globe: A Case Study Of A Citrus Group In South American Tropical Rainforests

Ever since flowering plants appeared on Earth around the Early Cretaceous (between 130 and 90 million years ago) (1), they […]

Photoautotroph: Definition And Examples

A photoautotroph is an organism that can synthesize its own food source via sunlight and carbon dioxide. Photoautotrophs utilize energy […]

High School Calculus Not As Important As Mastering Prerequisites Study Finds

Those who are struggling to learn calculus in high school, or who did struggle with it in high school, may […]

Scientists Solve Key Flaw In Solar Panel Design That Has Reduced Efficiency For Decades

Scientists have recently identified a key structural problem in solar panels that reduced their efficiency. The silicon used to create […]

Scientists Develop Squirtable Glue That Seals Wounds In Seconds

Most wounds that require surgeries are done with stitches or staples employed to hold the wound shut. However, two new […]

Comparability Of Meteoric Water Lines: Daily, Monthly, Or Annual Data?

The stable isotopes of hydrogen (δ2H) and oxygen (δ18O) have been widely applied in hydrology. The equation relating δ2H to […]

Science Trends is a popular source of science news and education around the world. We cover everything from solar power cell technology to climate change to cancer research. We help hundreds of thousands of people every month learn about the world we live in and the latest scientific breakthroughs. Want to know more?