ExtraHop open sources massive machine learning dataset

Thu, 14th Sep 2023

FYI, this story is more than a year old

ExtraHop, the cloud-native network detection and response (NDR) specialist, has announced it is open sourcing its expansive 16 million row dataset - one of the most robust available - to help defend against domains generated by algorithm (DGAs).

This is in an effort to level the playing field for defenders and empower businesses of all sizes to better secure their organisations by strengthening defences against malware and botnet operations.

Amid a widening cybersecurity skills gap (up 26% in the last year) and dwindling resources, the cyber landscape is rapidly evolving. As new threats rapidly appear, open sourced research and datasets are a solution to overcoming the challenges security teams face on a daily basis, the company states.

Raja Mukerji, Chief Scientist and Co-Founder, ExtraHop, says, "The challenges we face in security are formidable and dynamic, and, with this initiative, were democratising the tools needed for threat research detection for security teams of all sizes, backgrounds, and industries."

"Collaboration among the cybersecurity community is invaluable - coming together to share our best work is the only way to remain on the offense and put attackers at a disadvantage. Our research will be a gamechanger for the community and we encourage other teams to open source their own insights that will similarly benefit the industry at large."

Striving for industry collaboration, ExtraHop is releasing its DGA detector dataset, made up of more than 16 million rows of data, on GitHub to help security teams identify malicious activity in their environments before they become a business problem.

DGAs are used by threat actors to maintain control within an organisations environment upon making their entrance onto a network, making attacks difficult to detect and stop.

Originally built for ExtraHops award-winning NDR platform, Reveal(x), this research can now be used by any security researcher to construct their own machine learning (ML) classifier model to more quickly identify DGAs and intervene in attacks with greater speed and precision. Since its implementation in Reveal(x), the ExtraHop DGA model has demonstrated more than 98% accuracy.

Todd Kemmerling, Director of Data Science, ExtraHop, says, "Giving threat actors the ability to operate undetected and an uptick in these types of attacks, DGAs are increasingly considered a major threat to businesses today."

"As we began developing a model for detecting DGAs, it became apparent there was a lack of public datasets accessible to security teams with a wide-ranging set of resources. With this dataset, we are filling that gap, giving any security team access to the pivotal data needed to detect DGAs swiftly."

ExtraHop is a cybersecurity partner for enterprises. The company's Reveal(x) 360 platform is the a network detection and response platform that delivers the 360-degree visibility needed to uncover the cybertruth.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google