GSoC 2019: Project Ideas

This is the project idea list for the Google Summer of Code 2019 program. We have a mix of projects that are meant to be installed widely (such as plugins for other software) and projects that are more focused on improving user experience for users of Creative Commons licenses. Regardless of scope, these projects all have a broad and positive community impact.



Automated license link checking

  • Description:

    Automate testing of the links in the licenses (legal code) and the deeds (simple language descriptions of the legal code).

  • Rationale:

    Our licenses are used all over the world in many languages. When bugs or human error result broken links, the community suffers and development is slowed.

  • Resources:
  • Expected result:

    Public release of Python software that ingests the license source files and creates a configuration for existing Open Source link checking software.

  • Skills recommended: Python
  • Mentors: Alden Page (primary), Timid Robot Zehta (backup)
  • Difficulty: Medium
  • Proposal tag to use: Licenses

CC Search JavaScript library

  • Description:

    A JavaScript library allows users to search for CC-licensed content using the CC Catalog API.

  • Rationale:

    Providing a JavaScript library that allows integration of searching for CC-licensed content into other websites will expand the reach of CC-licensed content.

  • Resources:
  • Expected result:

    Public release of an functional, up-to-date, and well-documented JavaScript library using the CC Catalog API.

  • Skills recommended: JavaScript
  • Mentors: Breno Ferreira (primary), Alden Page (backup)
  • Difficulty: Medium
  • Proposal tag to use: Plugin

CC Search: search by use case

  • Description:

    Prototype in CC Search a way to search for specific materials to use for specific types of projects. For example: search images for slide presentations (stock photos), for printed and/or digital magazines (which might require high resolution), for educational material (use content from GLAM providers), etc..

  • Rationale:

    Addresses the following insights from our user research (see the Resources section for link to all insights):

    • People want to share and find good work, but find it difficult to navigate the abundance of content and information online.
  • Resources:
  • Expected result:

    A new feature or set of new features added to CC Search to support searching CC-licensed content for a specific purpose. You can decide what use cases you'd like to focus on.

  • Skills recommended: Python and Javascript
  • Mentors: Alden Page (primary), Breno Ferreira (backup)
  • Difficulty: Medium
  • Proposal tag to use: Search

Contact content creators easily

  • Description:

    Prototype an easy way for a user to get in touch with a creator and/or vice versa that ties to a CC license or tool. This could be done in a number of ways, including a button that is chosen from a new CC chooser, a deed + platform solution that connects users to creators, or a separate “contact me” button.

  • Rationale:

    Addresses the following insights from our user research (see the Resources section for link to all insights):

    • People like seeing how their work is used, where it goes, and who it touches, but have no easy way to find this out. This insight incorporated the following two insights:
      • People care that the work they share resonates with people, especially personally, but can only know this if they are told directly by the person it resonated with.
      • People want their work to have real world or social impact, but their sense about what these impacts are are vague. However, people can identify some real or potential outcomes from sharing their work that they enjoy.
  • Resources:
  • Expected result:

    A working tool that ties into an existing CC tool that allows users to get in touch with creators of CC-licensed content. The implementation details and design are up to you.

  • Skills recommended: JavaScript, Python
  • Mentors: Timid Robot Zehta (primary), Hugo Solar (backup)
  • Difficulty: Medium
  • Proposal tag to use: Usability

Copyright status and public domain information awareness tool

  • Description:

    A tool that promotes awareness of the copyright status and public domain information (e.g. how long has it been in the public domain) of works using Wikidata properties associated with that work.

    This is an open ended project and there are a lot of tools that can be built to improve copyright status awareness. We'd like a proposal for a single tool to build. Here are a few ideas that we have:

    • a standalone tool or service that acts as an interface on top of Wikidata that displays works according to their copyright status and allows people to edit the information easily.
    • an improved process for bulk-updating Wikidata with information from other sources (e.g. Metropolitan Museum of Art, Cleveland Museum of Art open APIs)
    • an API where cultural heritage institutions could make bulk queries around copyright status.
    • a browser extension that parses and analyzes works from the current website and displays the status of those works.
  • Rationale:

    We'd like public domain resources to be reused and remixed and a good first step to that is for people to be aware of what content is free to use.

  • Resources:
  • Expected result:

    An open-source software project that makes the public domain date and copyright status information of works on Wikidata easier to update or use.

  • Skills recommended: Python, JavaScript
  • Mentors: Sophine Clachar (primary), Kriti Godey (backup)
  • Difficulty: Hard
  • Proposal tag to use: Data Visualization

Creative Commons Archive

  • Description:

    Prototype a few concepts that provide creators with the choice of archiving a version of their works when CC licensing. This could be an archive we provide as a service, tied to a new chooser tool, a separate web page for preserving your work, and also in partnership with an organization like the Internet Archive.

  • Rationale:

    Addresses the following insights from our user research (see the Resources section for link to all insights):

    • People like the efficiency of sharing via centralized platforms, but are frustrated by the lack of control and ownership over their work, and increasing devaluation of individual creativity.
    • People have a desire to create work that is lasting and meaningful, that eventually has a life of its own, but don’t know what to do with a work beyond publishing it.
  • Resources:
  • Expected result:

    A working interface for users to be able to archive their CC-licensed content. The potential designs linked above are just ideas, you do not have to follow them although you can certainly build on them. Both the backend and the frontend should be functional.

  • Skills recommended: Python, JavaScript, basic understanding of databases, basic understanding of APIs
  • Mentors: Alden Page (primary), Sophine Clachar (backup)
  • Difficulty: Medium
  • Proposal tag to use: Usability

Creative Commons JavaScript license chooser

  • Description:

    Our license chooser JavaScript widget has not been updated since 2015. We'd like to update it so that it is functional and follows modern coding style and best practices, including not hardcoding license information and removing its dependency on PHP.

  • Rationale:

    Providing a functional JavaScript widget that allows anyone to embed choosing a CC license on their site will widen the reach of CC licenses.

  • Resources:
  • Expected result:

    Public release of an functional, up-to-date, and well-documented JavaScript widget or library.

  • Skills recommended: JavaScript
  • Mentors: Breno Ferreira (primary), Kriti Godey (backup)
  • Difficulty: Medium
  • Proposal tag to use: Plugin

Creative Commons plugin for any creation platform or tool

  • Description:

    We'd like to create plugins for other platforms to help users find CC-licensed content (using the CC Catalog API), attribute them correctly, and license their own work under CC licenses. Any creation platform or software is fair game. Some ideas: Google Docs/Slides, Microsoft Office, browser extensions, etc.

  • Rationale:

    We would like to integrate with as many content creation platforms and tools as seamlessly as possible to promote use of CC licenses and discovery of CC-licensed content.

  • Resources:
  • Expected result:

    Public release of a functional and well-documented plugin.

  • Skills recommended: depends on the project
  • Mentors: Timid Robot Zehta (primary), Kriti Godey (backup)
  • Difficulty: Medium
  • Proposal tag to use: Plugin

Creative Commons Resource Archive

  • Description:

    We have an old website that collects resources related to using Creative Commons licenses. It is extremely outdated and we would like to rewrite it from the ground up to make it easy for non-technical users to update and to match the style of CC's current website.

  • Rationale:

    A place to collect resources about CC licensing would be very useful to our users but the site as-is is not usable. The design is not modern or intuitive and since it uses a static-site generator, it is not easy for non-technical users to add content, which leads to it being out of date.

  • Resources:
  • Expected result:

    Release of an up-to-date site that provides an interface for non-technical users to add content. It should be styled more modernly and to match CC's current website. It should be easily deployable and the code should be documented. We also want to take advantage of existing content and avoid duplication so it would be desirable to grab all the existing online resources on all the Creative Commons websites and external services used by CC (e.g Vimeo, YouTube, Flickr).

  • Skills recommended: Python or WordPress/PHP, basic understanding of databases, basic knowledge of APIs, potentially JavaScript
  • Mentors: Hugo Solar (primary), Breno Ferreira (backup)
  • Difficulty: Easy
  • Proposal tag to use: Usability

Creative Commons WordPress plugin

  • Description:

    Our WordPress plugin has not been updated for two years. We'd like to update the plugin so that it is compatible with the latest version of WordPress and the code is in line with WordPress best practices. Add features to it that would help users find CC-licensed content, attribute them correctly, and license their own work under CC licenses.

  • Rationale:

    WordPress is one of the top platforms for creators on the internet who both produce and consume CC-licensed content. We would like to integrate with it as seamlessly as possible to promote use of CC licenses and discovery of CC-licensed content.

  • Resources:
  • Expected result:

    Public release of an functional, up-to-date, and well-documented WordPress plugin.

  • Skills recommended: PHP, WordPress, potentially JavaScript
  • Mentors: Hugo Solar (primary), Kriti Godey (backup)
  • Difficulty: Medium
  • Proposal tag to use: Plugin

New educational tool for CC licenses

  • Description:

    Prototype a new pathway and educational tool that clearly communicates the differences between CC licenses and leads the creator to the appropriate license for her needs. See the resources section for the link to the current license chooser.

  • Rationale:

    Addresses the following insights from our user research (see the Resources section for link to all insights):

    • People understand that CC stands for free content sharing, but the nuances of the specific licenses are lost on them — including experts and longtime CC users.
    • People are motivated to license their work under CC, but have a hard time figuring out how to do it.
  • Resources:
  • Expected result:

    A working interface for users to learn about CC licenses and pick the one most appropriate for their needs. The potential designs linked above are just ideas, you do not have to follow them although you can certainly build on them.

  • Skills recommended: JavaScript
  • Mentors: Timid Robot Zehta (primary), Breno Ferreira (backup)
  • Difficulty: Easy
  • Proposal tag to use: Usability

No-click attribution for CC-licensed content

  • Description:

    Prototype a tool that removes all friction to correct attribution. This could play out in a number of ways, including having attribution and related information attach upon download of an image (0 click attribution) in CC search, an attribution filter/plugin service that bulk links attribution, or a credit that is automatically added by a platform or related service.

    Another way of no-click attribution could be an opt-out watermarking, and most importantly, metadata embedding. How can we add CC metadata to mp3 files, or exif-like content to photos? Is it possible to encourage advertisers to display a non-intrusive barcode or qr with the bare work ID? Can the ID become a visual mark for the commons interesting enough to be displayable (imagine a T-shirt with this unique ID showing for example CC-ID#1 and then looking it up online to find that the visual representation of the very same ID to be the licensed work).

  • Rationale:

    Addresses the following insights from our user research (see the Resources section for link to all insights):

    • People are motivated to give credit to other people, but they find attribution complicated and a hassle.
  • Resources:
  • Expected result:
    • A new feature or set of features added to CC Search to make no-click attribution possible. Some high-level ideas are in the project description but the implementation is completely up to you.
  • Skills recommended: JavaScript, Python
  • Mentors: Alden Page (primary), Breno Ferreira (backup)
  • Difficulty: Hard
  • Proposal tag to use: Search

Reward and delight users of CC licenses

  • Description:

    Prototype a small, fun idea that gives reward and delight to users, e.g. a graphic CC mascot overlaid to help users navigate the licensing process.

  • Rationale:

    Addresses all the insights from our user research.

  • Resources:
  • Expected result:

    There are a wide range of acceptable results for this idea. We're looking for an improvement to one of our existing tools or an entirely new tool that makes working with CC licenses make the user smile.

  • Skills recommended: JavaScript, Python or WordPress/PHP
  • Mentors: Kriti Godey (primary), Breno Ferreira (backup)
  • Difficulty: Hard
  • Proposal tag to use: Usability

Supercharge our search indexer

  • Description:

    CC Search is a system for searching hundreds of millions (eventually billions) of Creative Commons works. We store all of these documents inside of a PostgreSQL database. To enable rapid search performance on a dataset of this size, we mirror the documents to Elasticsearch weekly. It takes about 20 hours to index 276MM documents, but the speed could be greatly improved through parallelization across multiple nodes and multithreading. This project represents a great opportunity to learn about the challenges of distributed computing.

  • Rationale:

    Faster indexing allows us to deliver higher quality search results to our users in less time.

  • Resources:
  • Expected result:

    Ideally, distributing the indexing process across 5 nodes should cut the indexing time by 80% (or 4 hours compared to the current single-node, single-threaded implementation).

  • Skills recommended: Python, basic understanding of threads, basic understanding of databases, benchmark-driven mindset.
  • Mentors: Alden Page (primary), Timid Robot Zehta (backup)
  • Difficulty: Hard
  • Proposal tag to use: Search

Unique IDs for CC-licensed content

  • Description:

    Prototype a CC unique ID registry that links to the CC catalog and provides information about each CC work through the ID, e.g. CC/12345 would display information such as author, number of shares, etc.

  • Rationale:

    Addresses the following insights from our user research (see the Resources section for link to all insights):

    • People are motivated to give credit to other people, but they find attribution complicated and a hassle.
    • People like seeing how their work is used, where it goes, and who it touches, but have no easy way to find this out. This insight incorporated the following two insights:
      • People care that the work they share resonates with people, especially personally, but can only know this if they are told directly by the person it resonated with.
      • People want their work to have real world or social impact, but their sense about what these impacts are are vague. However, people can identify some real or potential outcomes from sharing their work that they enjoy.
    • People want to share and find good work, but find it difficult to navigate the abundance of content and information online.
  • Resources:
  • Expected result:

    A working prototype of the unique ID registry for CC-licensed content. It should be connected to our existing data indexed for CC Search and there should be a frontend to show data for a given work.

  • Skills recommended: Python, basic understanding of databases
  • Mentors: Alden Page (primary), Sophine Clachar (backup)
  • Difficulty: Hard
  • Proposal tag to use: Search

Visualize CC Catalog data

  • Description:

    We'd like to create visualizations of all the data that is stored in the Creative Commons catalog (over 250 million works and growing) and how they link to each other.

  • Rationale:

    We've indexed a huge amount of CC-licensed content and made it searchable via CC Search, but we don't have a way of conveying the scale of this work to the community. Being able to access visualizations of all our indexed content is a good way for the community (and us) to see how much data we have indexed and find and explore relationships between CC-licensed content on the web.

  • Resources:
  • Expected result:

    Ideally we'd like these visualizations deployed on the web and updated in near real-time as we index new CC-licensed content.

    The type of visualizations are not set in stone. We've prototyped a force-directed graph in the past and would like to have a maintainable version of that but other visualizations are up to you!

  • Skills recommended: JavaScript, data visualization tools, potentially Python
  • Mentors: Sophine Clachar (primary), Hugo Solar (backup)
  • Difficulty: Medium
  • Proposal tag to use: Data Visualization

Visualize Flickr's CC license use

  • Description:

    Flickr is one of the top platforms where creative individuals share their works under a Creative Commons license. There is an estimated 400 million CC-licensed works on Flickr and the volume of data increases daily. We are looking to develop a CC license tracker which will entail:

    1. Data driven visualizations on the historical use of CC licenses (from 2004 to present)
    2. A realtime photo tracker that will keep us informed about the usage and/or popularity of CC licenses on this platform.
  • Rationale:

    There is a lot of CC-licensed content out there and we'd like to showcase that in a way that's easy for the community to understand.

  • Resources:
  • Expected result:

    Ideally we'd like these visualizations deployed on the web and updated in near real-time as we index new content.

  • Skills recommended: JavaScript, data visualization tools, basic knowledge of APIs, potentially Python
  • Mentors: Sophine Clachar (primary), Breno Ferreira (backup)
  • Difficulty: Medium
  • Proposal tag to use: Data Visualization

Your idea here

  • We are open to original ideas for projects that will help increase the utility of CC-licensed content, ease the process for creators applying CC licenses to their content, or improve CC's internal tools or processes. Please talk to us on the #cc-gsoc channel on Slack or via the mailing list to find a mentor for the project before submitting your proposal.
Back to top