Breach, leak or scrape? There can still be data misuse

Estimated reading time: 5 minutes

In today’s world if we have paid any attention to cybersecurity events, we may recognise these events as unwanted cyber breaches. Some malicious attacker exploits a vulnerability to insert code that steals data, or compromises data for their advantage. Cybersecurity is the act of safeguarding this data.

Privacy, is something very important as well. Data privacy has the notion of individual rights tied to it. But, it becomes more difficult to keep our private personal data under lock and key, as so much of our personal data gets digitised. Most of us give it away in exchange for freebies or services.

Some services will also not work unless we give in information like our email addresses and/or phone numbers.  Most will not think twice about updating their thoughts, activities, feelings on social media, all in the name of updating their contacts about their happenings. It is very difficult to define privacy in today’s world, and especially in a digital world context.

The above is a very, very simple understanding I have about cybersecurity and data privacy.

And when it comes to definition of privacy, I think the lines are very blurred.

But, the reason why I bring these up is to be able to introduce the concepts of breaches, leaks, and scrapes.

Clubhouse  …. Breach? Leak? Scrape?

In early April, news sites reported that audio-drop in chat service Clubhouse had been “breached” and 1.3 million user records were now made public.

Parties who wish to be precise, said “breached” was a strong word to use. The data was already publicly available and can be accessed and viewed via their API.  There was no stealth activity to infiltrate its database and take all of that data.

During a SecurityLah! Podcast episode, Nigel pointed out that there was no unauthorised access to the data, while Skywalker said, “We cannot blame people for data scraping. You kept your car door open, don’t blame the guy who comes along to drive it away.”

Their system or database had not been breached and to say so, implies that this is a cybersecurity event or incident.

Instead, Clubhouse’s user data had been scraped, or harvested to be collated into unknown third-party databases for unknown reasons. Scraping is not strictly illegal… yet.

The privacy-cybersecurity expectation

Truth be told, when I signed up for Clubhouse, I had very minimum expectation of privacy, especially from a new startup. I willingly shared my phone number and created a user name, to be able to use the service.

That’s all I shared right? Wrong.

Later, shared that the exact user-related info found in Clubhouse’s database, included the following.

  • User ID
  • Name
  • Photo URL
  • Username
  • Twitter handle
  • Instagram handle
  • Number of followers
  • Number of people followed by the user
  • Account creation date
  • Invited by user profile name

So far, maybe no harm no foul.

But then, huge dumps of Clubhouse user data started to show up on Internet forums, for whoever to do whatever they wish with it.

How do we feel now? Perhaps still cool.

Data misuse and abuse

What if it is later revealed that some parties took the opportunity to monetise all of that user data? The type of data leaked, when viewed in isolation may not mean very much.

But let’s not forget technological advancements we see like data analytics and artificial intelligence can easily make sense of all that to create a  profile of you, your actions, habits, preferences and more.

Businesses use these technologies to better understand us as customers. Why wouldn’t cybercriminals use these same technologies to better understand us as their potential victims?

If the same cybercriminal buys leaked user data from Clubhouse, Linkedin and Facebook, the picture they get about their potential victims, becomes pretty complete. 

(Sure enough, over a billion Facebook and Linkedin profiles, were found to be out on the Internet before the Clubhouse event. These were easier to identify as data leaks because of nature of information released).

Is this not enough information to carry out social engineering attacks?

One example of a social engineering attack is the Macau scam which to date, has caused total loss of RM15.26 million for 556 victims, if not more.

While we let that sink in, how do we exactly feel right now?

The gist: Our data is open to misuse

Let’s get back to the point of this article.

The method that one party used to gather our data seems legitimate – we had consented to and willingly given it to them. Like in the case of myself and Clubhouse.

But it quickly turned into a potential cybersecurity event, because user privacy is the last thing on Clubhouse’s mind.  Why did Clubhouse not prevent the rapid and automated scraping of its user profiles? Many social media platforms like Facebook and Linkedin also, also do not bake privacy considerations into their platforms.

Prof asked during SecurityLah!’s privacy episode, “Don’t these online service providers have the responsibility to ensure that our data is not misused?”

Data scraping is not illegal, but Clubhouse’s API was abused when someone collected all of that publicly available information… and without the users’ express consent.

There are many, many more situations where our data, or the data which we have about others, are actually misused without us knowing. In a podcast episode, Doc introduces these situations which may be happening to us on a daily basis, and without us realising our contribution towards it.

The bigger real issue: Accountability

There were many news and social media responses to the label “Clubhouse data breach”. A fair number do not think it is a data breach because the information was: 1. publicly available 2. not sensitive.

I think Tom’s Guide took the right and sustainable approach to this incident. It reports that we are too focused on the cybersecurity aspect of incidents, which is actually the wrong lens to be looking through.

If we focus on cybersecurity, how are we going to take the steps to address and rectify privacy violations?

Our information is never cleared and safe from being misused no matter whether our info was obtained with our knowledge/consent, or without.


Gauging data misuse by whether a cybersecurity breach technically took place, forgets about privacy. Ultimately, we must not forget the privacy aspects because to do so, would absolve online services like Clubhouse and Facebook, from ACCOUNTABILITY.

It does not matter how the data got out. Be it via a data breach or with our consent … the cost is the same.