David is VP of Engineering at Fivecast, responsible for the practical application of software engineering best practices to digital intelligence solutions. He has more than 15 years of experience in senior technical roles with companies including BAE Systems and Tenix Defence.
The explosive growth of the Internet has been a blessing and a curse for Open-Source Intelligence (OSINT) analysts – on the one hand, they have access to an almost endless stream of data, leading to OSINT playing an increasingly important role in intelligence gathering. On the other hand, information overload is a real concern, and simply searching masses of data looking for anything interesting will inevitably lead to burnout. Thankfully, the data volume challenge can be mitigated by applying a clear OSINT strategy and modern OSINT tools. Automated, user-driven data discovery, collection, and risk analysis can arm OSINT analysts with significant analytical power, allowing them to focus on their strengths – defining their mission and using context and nuance when interpreting the data.
The Rise of AI and Machine Learning
Many recent advancements in OSINT tools have been driven by improvements in AI and machine learning. In particular, content understanding for text and multimedia has allowed AI to surface interesting insights from large volumes of raw, unstructured data for analysts to review.
However, like many industries, the intelligence sector needs to ensure that the AI and machine learning used to uncover critical insights and protect communities also preserves human rights. While machine learning techniques can be a powerful component of an OSINT strategy, it’s important their application is analytically sound, protects privacy, and respects all laws.
Techniques for Ensuring Ethical Use of AI in OSINT
At Fivecast, the global customers we work with in law enforcement, defense, and national security are united in their focus on leveraging OSINT technology solutions in an ethical way. We have implemented a number of techniques with our customers to ensure this can be achieved, which include:
Understanding privacy legislation for particular jurisdictions and specific operational procedures for OSINT collection
Provisions are usually made for the collection of open-source data for law enforcement and national security use cases, however, there are frequent limitations. The Fivecast ONYX solution supports targeted, user-driven collection and granular control over data retention to allow users to align data collection with their organization’s remit and policies. The over-riding policy should be to only collect information that is directly relevant to the investigation. Clearly defining the investigation purpose and having a technology solution that supports targeted data collection specifically related to that purpose is important to retain ethics in any investigation.
Understanding the limitations of machine learning models
It’s important that AI-driven tools present the output of analytics appropriately so that users don’t draw conclusions that aren’t supported by the data. While analytics driven by AI are powerful and essential to uncovering critical insights, the intelligence analyst should always be able to apply their tradecraft expertise to the risk analysis and retain control over investigation outcomes.
Having a reliable and accurate data provenance process in place
This becomes increasingly important with the use of AI in OSINT. Tracking information sources and having the ability to link back to the source data allows analysts to manually verify the output of AI when required. For mission critical decisions AI is best applied as a decision support tool, keeping the human analyst in the loop.
Being constantly vigilant of privacy considerations
Be aware of potential privacy violations and biases when collecting training data for machine learning models. It is often possible to train on parallel, open corpuses, or adapt existing, pre-trained models to a new domain, rather than collecting new bulk data.
Designing automated collection tools to discourage overreach if more data won’t be helpful and formulating the intelligence question up-front
Having a clear investigation objective and mission encourages analysts to look for evidence vs searching for unknowns. Fivecast ONYX supports this through re-usable and customizable risk detectors which can capture the “question” before beginning data collection. Defining the question and configuring risk detectors up-front not only discourages analysts from collecting irrelevant data, but also enables masses of white noise to be filtered out quickly so that analysts can focus on the riskiest content relevant to their investigation.
Paying attention to culture
The culture of an intelligence organization has an important role to play in the ethical use of AI. Encouraging a culture that values insights and results and has privacy protection as a core value is essential. Discourage misuse through internal policies, and back them up with audit trails and admin features that provide visibility and accountability into the use of digital intelligence solutions.
As intelligence teams grapple with the volume of data available and aim to use that to their advantage in investigations, the use of AI and machine learning will only increase. Indeed, here at Fivecast, we have a strong focus on continuing to deliver customers with better ways to uncover big data insights with AI. However, the most successful intelligence organizations will ensure analysts get the most value from AI-driven tools, while also remaining ethical and respecting laws and policy in their application.