I know most of the comments here are about the use of GA, but I just want to comment on the main point to say I think it's pretty great they are going with an open source first attitude. I work for a big intergovernmental org, and I can say that most/all our projects are very much closed source. :/
And importantly, when to use a deadbolt, or deadbolt + steel frame door, or a safe. All of which are well understood but can be implemented depending on when circumstances warrant. i.e. tool shed vs bank vault.
I try to answer this every time the question is asked; my last attempt seemed to be satisfactory to most people so easiest to just link to it: https://news.ycombinator.com/item?id=15070904
And I try to bring it up every time gov.uk is mentioned.
Safe harbour provisions and contractual agreements of this sort are effectively worthless when it comes to crossing borders, particularly where the US is involved.
Using adblockers to turn it off is not acceptable, and won't protect the majority of less tech-savvy folks.
If you can use Piwik for things you think must be be more secure then that tells me two things -
1. It's possible for you to use Piwik
2. You don't believe that sending sensitive data to google is always a good idea either. You just don't think that most government-citizen interactions are sufficiently sensitive for some reason.
Sure, it’s possible to run your own analytics achitecture but at a 100m visits a month is it practical? Point two of the GOV.UK design principles is to do less: https://www.gov.uk/design-principles#second . When GOV.UK first started it definitely had to hit the ground running which requires immediate evaluation of user data ASAP. Certainly at the time there wasn’t enough dev hours available given a limited number of people to evaluate, build and run an analytics framework of the scale necessary.
Having said that, then isn‘t now, and I wouldn’t be surprised if a future Government as a Platform service isn’t a cross-government analytics system hosted from the UK. I just don’t see the pressing need for it, based on the assumption that Google actually are anonymising the data. If you want to disagree with that assumption then that’s a valid viewpoint too, but I see no evidence for it.
> based on the assumption that Google actually are anonymising the data. If you want to disagree with that assumption then that’s a valid viewpoint too, but I see no evidence for it.
This is exactly the wrong way around. For every sensitive area, such as privacy, it is upon the company to prove proper handling of data. But if that company is outside your legislation, without any legal means in their country, how could that ever be possible?
Taking just their word is like trusting the food industry with hygienics until their customers become undeniably sick.
A better strategy is not to create sensitive datasets in the first place. In Germany, this principle is called "Datensparsamkeit", which could be translated to "data frugality".
Moreover, every country should have something like the FDA for data hygienics. Unfortunately, even in Germany where we do have "Datenschutzbeauftragte" (designees for data protection), those can make a lot of noise but don't have much power. This is still better than not having those people, though.
I’m not disagreeing with you outright, but your argument is on the ad infinitum scale. For instance, GOV.UK PaaS uses AWS as a host[1]. Is that worrying? Should we not be using Cisco gear because of US govt backdoors? Should we not be using Chinese manufactured chips? These considerations are ones the military has daily, but they hinder the ability to deliver. Analytics is arguable both ways (and I lean towards your argument at this point in time) but there are good reasons past and present for GA.
These aren't really equivalent to actively sending out data to overseas entities by anyone using your pages.
You absolutely should be taking precautions to make sure that what you're doing on AWS is secure. And in fact erring on the side of using providers who host in European countries, preferably European organisations.
(addendum - I spent a lot of the early part of this year working on AWS-based data processing systems for a large bank, they took massive precautions with the transport and storage of their data within the AWS system, including IPSec overlays, 14 day maximum node lifetimes and various other things. I realise that at some point you're trusting amazon, but there's a lot can be done to avoid having problems in the first place. "Not sending data to places you don't absolutely have to" seems pretty basic)
Google Analytics is a certainty, and reasonable alternatives are available.
I think it shows a terrible attitude from the department responsible to use GA. Not everything should be done by the lowest bidder, regardless of the costs. Maybe I'll write to my MP...
Or, in full: "Government should only do what only government can do. If we’ve found a way of doing something that works, we should make it reusable and shareable instead of reinventing the wheel every time. This means building platforms and registers others can build upon, providing resources (like APIs) that others can use, and linking to the work of others. We should concentrate on the irreducible core."
Slightly different than the title.
Can't do privacy because the principle says "Do Less"? Why do anything?
>> Sure, it’s possible to run your own analytics achitecture but at a 100m visits a month is it practical?
That's surely secondary to privacy and security?
>> If you want to disagree with that assumption then that’s a valid viewpoint too, but I see no evidence for it.
I see a whole litany of problems with your assumption, based on the state of the world in terms of large-scale leaks and hacks, on the laws in the US which are much weaker than our own protections, and on the actions of various US agencies when they desire access to privately held information.
I see no reason to believe that anonymisation of the date somewhere within google infrastructure is an adequate protection, whether it actually happens as contracted or not.
Look, it's clear that you don't consider this important enough to do something about. Some of us do, and some of us would rather that if you can't do this with private analytics, you don't do it at all. And if that makes delivering your service harder, slower and more expensive - so be it. What's happening here is not right.
I mean, what you're saying when you reference point 2 of your design principles, is that it's alright to throw user privacy under a bus, so long as you can deliver the software effectively. This isn't a good justification.
I don’t actually work for GDS anymore (or even live in the UK) but people there are receptive. Can you leave a comment against the blog post? Unfortunately I don’t know the best way to raise your concerns officially, but that would hopefully get an answer.
I have worked and continue to work on several Gov.UK projects and we take steps to ensure that anything that gets sent to GA is as anonymised as possible. Just last week we had a ticket in planning to further improve the anonymity of the data sent to GA for the UK MOT testing service, so it is definitely not like we’re just firing all sorts of data to third parties, we are thinking very hard about this stuff.
Preserving our users’ anonymity is one of our major priorities and we actively work to improve how we do this.
.gov organizations, like your boring standard-issue commercial ones, also want to know who is using their site and how. It helps inform decision-making about changes.
If you don't mind another dumb question, is the benefit great enough for the hassle of complaints and the possible compromise of ostensibly private information?
There are self-hosted solutions, but I imagine they'd entail experts to interpret and implement. I wonder if they could just do A/B testing and get the same results or if they could simply do surveys? Though, I suppose those come with new faults, variables, and expenses.
Generally,yes. It's also good for stuff like knowing what browsers are used and thus what technologies can be used.
The hassle of complaints and the possible compromise of "private" information (like what your browser sends) isn't that significant. The pain from complaints is trivial, largely confined to Hacker News threads of no significance whatsoever. The ostensibly private data is protected contractually and legally.
I'm not aware of any approach that both yields useful information and eliminates any possible risk of compromise. Surveys and A/B testing and self-hosted solutions all have the same problems and risks, and generally extra costs to boot.
In short, the cost-benefit analysis isn't all that different from that of a company considering an analytics tool.
Again, much thanks. No more questions, for now. Just a sincere thanks. I don't actually read HN for the articles. I do read many of them, but I'm here for the informative answers, insightful responses, and chances to get an education in subjects I'd not normally consider.
Once in a while, chaos theory or traffic modeling come up. So, I get to give back. ;-)
And I'll add another thing: This contract is between a government agency's bureaucrats and a private corporation, but it affects unknown numbers of private citizens who, really, have no choice but to have dealings with their government. Claiming that contract is sufficient is laughable, even if Google didn't have every reason in the world to disregard it and simply pay the fine associated with disregarding it.
Satisfactory? Saying "if you don’t trust them then that’s fine" and ignoring anyone who disagrees that that's fine isn't giving an answer -- including the poster you just replied to with this link. I don't believe it's exactly easy to achieve and maintain such a lack of self-awareness.
This was brought up before when some other gov.uk site was discussed. One of commenters mentioned that gov.uk has strict license agreement which prevents google from accessing analytics data. Do not have sources for that, but that would make sense.
> One of commenters mentioned that gov.uk has strict license agreement which prevents google from accessing analytics data.
The problem is that a license agreement prevents nothing; it may establish consequences for an action, should it occur and should it be detected and should the consequences be enforced, but it is unable to prevent that action from occurring.
That's the difference between 'can' and 'may.' A license agreement can't prevent something; a technical measure (e.g., not using Google Analytics or using some form of privacy-preserving analytics protocol) can.
If there's anything that human history teaches us, it's that Murphy's Law (that whatever can go wrong, will) holds: if it's possible for someone to do something one would rather he not, he probably will, sooner or later.
Sarcasm aside, I think this is an important point, and I would like to know if you or anyone here has suggestions for plug-and-play analytics solutions to look into for those of us who would prefer to avoid Google Analytics?
I think judiciary gov.uk cookies usage information [1] provides good detail on their usage of GA. In addition it provides a link to a Google developed Browser plug-in to 'opt you out' [2]
Only google analytics? I tried to watch an online news stream from our national broadcaster yesterday (http://www.abc.net.au/news/newschannel/) with noscript turned on and found it it had about 8 different trackers installed and refused to work unless they were enabled.
not really contributing to the discussion mate.
Its analytics, its a good product, people use it because its convenient.
What bearing does it have on the words in the article?
They are implying that it's a massive security and privacy hole and they think taking security advice from somebody "who has a key under the doormat" is problematic.
Not that I necessarily agree with the original comment but that's the reasoning.
could I not make the same argument that using English is just making the NSA's job easier? I mean, they do tap the wires don't they making much of this moot.
Yup,and if you have a news website or a shop, you go ahead. I specifically object to my own government sending data about citizen interactions with that government to an overseas third party.
Yes, it can be, and I believe convictions show this - if public means "someone else could have stumbled on the info, in theory" but you were specifically tipped off where to go and when in exchange for cash; which is the kind of worry here.
But as well, policy discussions that allow no privacy tend to be very circumscribed.
For tax policy, it would make sense to solve this problem in the entirely opposite way - i.e. declare that any tax policy changes cannot take effect faster than the next fiscal year; it's also fair and much more reasonable.
Let's say they decide to put a tarrif on something a lot of people hate or think problematic. Let's say for argument's sake, 2 stroke generators. But they have a lot of power (and users) so you don't think you can swing a yearly tax. But a tax on new purchases/builds might be tractable. They get to keep all the existing ones but they can't build more. That's grandfathering.
If you give the wrong people to the end of the year, they'll grandfather in as many as they can, even if they sit unused. So now you're putting off curbing the installed base and instead you've actually created a glut. Now your tax will have zero net effect for four or five years and will make things (say, smog) far worse in the short term.
It can actually end up unwinding your tax (or regulation) --
If the people rushing to buy up the supply to simply store them off-line happen to cut off people who need them for what are sensible and essential reasons, you end up with a litany of cases of the tax causing problems paraded about as it comes into effect, and it ends up repealed or undercut because of political pressure.
(Usually 1-year window extended because "we need more time to phase it in" and by the time the second window would close, it's gutted.)
It's not just that it can end up counter productive in the short term, it's that it can undercut itself as a law as well by destabilizing a market that usually has low, but important volume.
In some places, laws already work sort of like that. Except for emergency bills, they can't take effect until something like 180 days after congress has had their final session.
It makes perfect sense, to me at least, for taxation regulations to be the same. Of course, they'd probably just default to calling all rule changes emergencies...
Sort of related: I love the idea of paying taxes. Mine are prepared and filed as soon as I can do so. I have a great time paying my taxes, to the point where I bring a bottle of wine and flowers to my accountant. But, man, I really hate the complexity and how the money is spent.
Anyhow, maybe they should only make rules for the first half of the year and then none of the changes go into effect until the first day of the year? I like your idea.
Fraud/spam detection algorithms have to be security by obscurity, or someone can just check if what they're doing trips the algorithm and then not do that.
If you use an opaque machine learning technique to detect fraud, for the purposes of probity some kind of "parallel construction" could be employed to give a public exlanation; it's surely important to explain and justify to the humans involved. The analogy is with the person of a particular ethnicity who is always getting pulled over by the cops. This must be made transparent.
You don't need "parallel construction". You need to inform it was detected by some opaque means, and that you proceeded to verify it with a transparent and legal procedure.
What you can't do is detect it by opaque means and then proceed to sanction people or even gather some too elevated investigative powers.
The act of verification (of suspected fraud) is not costless to the various parties involved. So say I keep getting inconvenienced by this verification because of my behaviour ("was it really you who made this request?") then someone should be accountable for all those false positives. "Parallel construction" creates an accountability. I know it usually means something more sinister -- but just symtactically it's a good description of the requirement.
I do think we are trying to say the same thing. When I say that no unreasonable additional power should be gained by such detection, I intend that between a few things that the verification should not be costly to the investigated party (costs should be only on the investigators). Looks like you are naming that verification "parallel construction", and so, I do agree.
In the past I really worried about open source networked software, because however you slice it, it is missing a huge chunk of security by obscurity, making 0days very likely. Especially for new projects. All security vulnerabilities are out in the open for all to see. Sure, honeypots can help you learn about vulnerabilities, but it will take years to patch them all. In the meantime, everyone using your software is vulnerable.
Then I discovered blockchains. Here, the software is run by the network and does nothing persistent unless a majority of the nodes agree. That makes it much harder to corrupt the persistence layer. Blockchains are NOT just for achieving global consensus about a ledger. They can be per-stream-of-data. That's the approach we take at Qbix.
There are still many other vectors of attack besides corrupting the database. However, in Web apps, the real pernicious thing is corrupting the data. Everything else has already been secured by webserver makers and language runtime designers.
PS: Finally, you can corrupt things on the client level, eg making a client sign a transaction the user didn't authorize. But at least it is localized to the corrupted clients, and not the whole network.