In a digital open-source investigation, there is interest in the individuals or entities in focus, but it is valuable to understand the network they are a part of and this is where network analysis comes in.
Network analysis is the process of looking at the connections between people and entities to gain insights for your investigations. You can find out information hidden in plain sight by analyzing a random dataset and the network craft.
Dataset analysis is particularly useful not only in investigative journalism but also in the art of storytelling. With network analysis, journalists can expose controversial connections and clusters of influences that connect the dots between relationships and power, which are otherwise unseen.
Why network analysis is useful for digital investigations
Network analysis is both a data journalism technique and open-source intelligence (OSINT) technique. It is not yet widely applied in local journalism as a result of accessibility to the tools necessary to properly execute a digital analysis. These tools are not often free and have only become accessible in the last few years.
Network analysis can be useful to generate or check information by revealing patterns between social entities. It is valuable in communicating a story and allows the reader to explore a system of social influence. These influences could be political, social, economic, or business.
In journalistic practices, it can show connections between people looking at commonalities across fields such as education, employment history, or religious groups. It can reveal relationships between companies and directors within a company.
On social media, it can show the degree of density of relationships between influencers, cliques, power brokers and the outliers. Density is determined by the roles they play, who is the centre of influence, who acts as a bridge between different groups, and who is on the margins. It can also show a transition, revealing how much has changed over a particular time.
Network analysis explores two main ideas. Nodes and Connections. Nodes or elements are the entities you are interested in at a time. It is measurable. It could be in the form of documents or locations, individuals or organisations, while connections or edges are the relationships between the nodes.
This other example applies network analysis and visualization to the characters of Game of Thrones, a popular fan-fiction book and television show.
*Network of Thrones
Other areas where network analysis is applicable:
Performing network analysis is a good way to reveal digital connections. Fincen-files is an area where network analysis is specifically helpful to do investigations into companies, such as the investigation into the Panama Papers by the International Consortium of Investigative Journalists to which the Premium Times Centre for Investigative Journalism belongs. On the website of the International Consortium of Investigative Journalists (ICIJ), you can look at a network data map, and this map shows information about transactions that have been flagged by financial institutions such as suspicious United States authorities. Fincen is the US Financial Crimes Enforcement Network. It is a section of the US Treasury Department in charge of investigating financial crimes.
Another area where network analysis is often applied is investigating social media data, especially in the context of information campaigns, as well as mis- and disinformation. Network analysis can discover various patterns of inauthentic behaviour in social media datasets.
Investigative journalists at Bellingcat researched the account of the head of the World Health Organization. They observed that he was repeatedly targeted with memes, all of them very similar. Through network analysis and visualization they discovered that only a few Twitter accounts were sending out those memes over and over again, which most likely represented a sort of network activity.
Admittedly, these are complex examples but sometimes network analysis can be as simple as collecting a list of creation dates from several twitter accounts.
Challenges Faced in the Practise of Network Analysis in Journalism:
Jonathan Stray, a fellow and lecturer at the School Journalism, Columbia University and author of Network Analysis in Journalism Practises and Possibilities cites a few challenges of network analysis in journalism. The first challenge, he says, is the cost of acquiring data which is very often expensive, as some documents require a pay-per-page. Then, there is a record linkage problem where graph queries may not be helpful even though they are better than algorithmic techniques at investigating diverse data sets.
There is the issue of context. Most of the information needed to interpret the network is not contained in the network and the story; in reality, it is more complex than simple computation. So, in journalistic practises, the reporter uses the network as a sort of conceptual map for a story.
Furthermore, journalists do not all have all the data needed for analysis at once, which means that they build the network as they report the story creating a familiarity with the data. For this reason, graph databases have emerged as a flexible technology for investigating diverse data sets.
However, graph queries are hard to use, and not all investigative reports can be expressed in that form, but Stray advises journalists to think of networks as narratives to hand build the graphs, adding the nodes one at a time. In his words, “there is a deep conceptual distinction between the data that is in the database and the visualizations that the reporters are using to actually report the story. It is not about the algorithms but about the actual flow of the data that journalists are using.”
The Panama Papers is an example of the application of graph databases in future journalism analysis. Although, he says they were not used to find a path between the nodes in the Panama Papers investigations, Stray presents a situation where data is gathered, fused, running queries on them, and finally, an interactive visualization. He calls this the one-graph-to-rule-them all approach which he believes will solve investigative journalism in very complicated stories. The limitation of this method is that data must be in a structured form. According to him, The bulk of the Panama Papers were documents which never had entities extracted; all of the graphical representations of the Panama Papers network were derived from structured data only.
Performing network analysis using data from social networks
Network analysis is useful in finding out who is in the centre of a discussion on social networks and how accounts in a specific dataset are connected.
It is important to identify what types of relationships you would like to investigate, who and what intersect with them, and where and how it might be measured. These relationships can range from the closest interactions within a network such as a parent or a colleague or follower to shared experiences like payments or financial transactions, appearance or correspondence between them.
Relationships are only measurable after they prove a quantifiable link to the entity investigated. They can reveal echo chamber effects on social networks like Twitter. An echo chamber effect on twitter refers to the tendency to follow or retweet only those with whom you agree.
Journalists can create an ongoing power structure analysis to find connections between politicians, advocates and business people using network analytic tools. In the end, all record linkages have to be reviewed by a human before it can be used in published work.
Tools for Network analysis:
In journalistic practises, network databases are often extracted from unstructured documents like public records which are presented as visualizations. Data visualization makes it easier to spot patterns and connections and a popular way to present results. Many tools involved in analysis are data visualization tools but there are also tools for graph and mapping. Some tools like Kumu allow for manual mapping and drawing. Here are a few of them:
Netlytic: This is a cloud-based text and social networks analyzer. It allows you to collect and visualize tweets. It has its limitations regarding the size of the datasets in the absence of advanced features. But it is a great starting point and Netlytic allows you to import a thousand tweets from the last seven days.
Maltego: This tool for showing the connections between nodes. It gathers information on the entities being investigated while providing a graphical link analysis on datasets. Here is a simple tutorial guide.
Kumu: This tool is useful for mapping. It allows you to manually draw a network and its connections and stores the data in the process.
Neo4j: This is the tool used in the analysis of the Panama Papers and the Bahamas Leaks investigations by the International Consortium of Investigative Journalists (ICIJ). It is useful in processing large datasets. Here is a tutorial for journalists.
Gephi: Gephi is a free open-source tool that provides social network analysis, link analysis and data analysis and graphical presentation.
Linkurious: This tool brings together analytics, graph and visualization which removes the difficulty of tracking scattered information to detect and investigate threats.
Graphistry: Graphistry automatically transforms data interactive visual investigation maps while linking events and entities for analysis.
The researcher produced this fact-check per the Dubawa 2020 Fellowship partnership with NewsWireNGR to facilitate the ethos of “truth” in journalism and enhance media literacy in the country.