herraiz.org | Blog
Main | Blog | Research papers | PhD thesis | GnuPG (PGP)
First steps in Deep Learning (workshop @ KSchool)
Although I am not the highest fan of notebooks for data science projects, it is undoubtedly a great tool for purposes like teaching, or delivering a talk.
In fact, yesterday I delivered a small workshop at KSchool on making your first baby steps on deep learning, using Google Colab, the Python notebooks tool of Google.
Google Colab works like Jupyter, except that several people can work on the same notebook at once, à-la Google Docs. But it comes with even nicer things: you can use a K80 Tesla GPU with Tensorflow and Keras, to have lightning-fast training even when dealing with large datasets. For free, just with your Google account.
In the workshop yesterday, we used a small dataset, the famous MNIST handwritten digits set. But I have tried working with much larger images in Colab (loading them from Google Drive!), and it works like a charm.
If you are curious about Deep Learning, or about using a GPU entirely for free, have a look at the Keras notebook I prepared for the workshop yesterday:
If you don't have a Google account, grab the notebook from this open link, you can downwload it anonymously and run it locally using Jupyter (with Python 3 and Keras).
You can download it and use it locally with Jupyter, or copy it to your Google Drive to run it in your environment at Google Colab.
The notebook includes detailed explanations of every step we did at the workshop, and links to external materials and videos, just in case you want to extend on some of the details.
Killing machines - why data science needs software engineering
Jupyter notebooks (and other notebooks-based system) are a very popular and handy tool in data science. With a notebook, you can quickly explore different alternatives, getting immediate feedback, producing plots, combining code and text, achieving a form of literate programming, and allowing a hassle-free sharing of your results – a key enabler for reproducibility, essential feature of any empirical discipline.
However notebooks are not enough.
Why? Let's get know to an story about artificial intelligence and the singularity.
Elon Musk and Mark Zuckerberg have recently engaged in a public debate about the perils of artificial intelligence (AI). On one side of the debate, Elon Musk defends that AI may give birth to future killing machines that will exploit humanity as slaves. The moment when it will happen is known as the singularity. From that moment on, machines may autonomously decide to kill us, or to enslave us, or to do us anything they please.
What Elon Musk fails to notice is that we have alredy created machines that kill humans autonomously: the Therac 25, a radiotherapy machine that killed 4 people and severely injured 2 more.
How did we as humankind design such a horrible machine? What kind of hyper-sophisticated super intelligence did we create for the Therac 25?
It was much easier than that. We just were sloppy. We did not follow (now common) best practices for software development.
Whether we like it or not, we as data scientists write code on a daily basis. If you don't want the humankind to be enslaved by machines, don't forget, notebooks are great but not enough. You need software engineering too.
Want to know more? Find the story and links with more details in the slides I presented at Databeers Madrid on June 29:
Trying to write from iPad
My blog works using Jekyll and Org Mode, along with a Git repository. Every time I push a new org file to the repository (a new blog post), it gets translated into HTML by Emacs, and published in the web.
This is very cool, as I can write in the blog while I am offline, and just push the changes whenever I get online again.
But there is a big drawback: I can only post from devices that run Emacs.
In the last months I have started to write more and more with an iPad, and I have always felt that it really sucks that I cannot post in my blog directly from the iPad.
After some investigations, I think I have found the setup that will allow me to write from the iPad, and maybe, this can also mean that I will revive my blog.
In order to publish a blog post from the iPad, I need two pieces:
- A text editor, with nice syntax highlight if possible
- A Git client, that allows to push to my personal repo using SSH
I have not been able to find a text editor with Org mode syntax higlighting. But other than that, Textastic is just working fine.
It integrates very well with Working Copy, a very nice Git client for iPad. This app allows to commit in a local clone in the iPad, and then push to my server.
So now, I can write the Org file in the editor, save it in Working Copy, commit and push it. The hooks in my cloned repo at the server, where Emacs and Jekyll are installed, just take care of the rest.
Older posts
- Work like in the teams of the future (Aug 11 2014)
- Will it work in the MIR? (Jul 10 2014)
- Intensive Metrics for Software Evolution (May 17 2013)
- Don't do empirical software engineering with Excel (Nov 23 2012)
- The impact of bias in bug-fix datasets for defects prediction (Apr 15 2012)
- Visiting UC Davis (Apr 02 2012)
- Popularity bias in bug datasets (Nov 01 2011)
- IJSODIT - Call for papers 2012 (Sep 29 2011)
- The interplay between businesses and open source (Sep 08 2011)
- Software and the game of life (Jul 29 2011)
- What's the distribution of software size? (Jul 20 2011)
- Software projects alzheimer: Julian Assange's lost contributions (Jul 07 2011)
- Practical Analyses of Software Engineering Data (Jun 15 2011)
- Empirical Software Engineering in Practice -- CFP 2011 (Jun 13 2011)
- Grafiti no es negocio -- Mi visión sobre las acampadas (May 25 2011)
- IJSODIT - Call for papers 2011 (Mar 29 2011)
- Mis impresiones sobre el Día Garum (Mar 05 2011)
- Nos vamos a Bilbao (Feb 15 2011)
- Reflexiones sobre el ciberpunk (Feb 03 2011)
- The dynamics of software evolution (Jan 24 2011)
- ¿Cómo he llegado al itinerario? (Jan 10 2011)
- ¡Hola itinerario! (Jan 04 2011)
- Debian finally shipping a free kernel (Dec 15 2010)
- Freenet, an anonymous and distributed network (Dec 11 2010)
- PyTwerp working again with Twitter (Dec 10 2010)
- "Making software" is out! (Nov 22 2010)
- Do featured articles get more visits in Wikipedia? (Nov 15 2010)
- What is the MSR challenge? (Oct 11 2010)
- IWESEP 2010 -- International Workshop on Empirical Software Engineering in Practice (Aug 23 2010)
- Learning by doing (Aug 10 2010)
- Data for Mining Software Repositories (Jun 25 2010)
- The eye of the tiger: agile methods vs. architecture (Jun 21 2010)
- Code as design. Or what's the point of Software Engineering? (Apr 06 2010)
- Hello Linkedin (Apr 02 2010)
- Special issue of the IJOSSP (Feb 23 2010)
- Where are you? (Feb 05 2010)
- New GPG key (Jan 27 2010)
- Under attack (Jan 19 2010)
- Hello world (Jan 18 2010)