DAS 10A: Developing the Discovery Engine: Lessons Learned

Track: Data Analytics Symposium

Session Number: DAS 10A
Date: Thu, Jul 27th, 2017
Time: 3:00 PM - 3:45 PM

Description:

At Berkeley, we built a novel prospecting tool (the "discovery engine") based on specifying constituencies using simple predicates such as "has an interest in neuroscience" that can be combined into complex and precise definitions. The tool generates SQL behind the scenes and returns IDs that satisfy the definition. It's had a significant impact on the way the Research team does prospecting, and its easy extensibility has led to new applications that we never imagined when building the tool. I would talk about how and why we built this tool, and the lessons we learned along the way. The public documentation for the project is here: https://tarakc02.github.io/discodocs/
Sub-Categorization: Enterprise Track
Session Type: Breakout Session (45 minutes)

Primary Competency: PR:Competency 1: Information Management/Records Management
Secondary Competency: DA:Competency 8: Analytics and Campaign
Tertiary Competency: CA:Competency 4: Campaign Identification and Pipeline
Intended Audience Level: Level II
Learning Objective #1: Attendees will learn how to model high-level domain problems using a computer programming language such as R
Learning Objective #2: Attendees will learn how to leverage resources such as a data warehouse and a data dictionary to design tools that respond to specific business needs.
Prerequisites: Experience coding in any language is required. Familiarity with the R programming language is helpful, as is a familiarity with relational databases and SQL.
Sub-Categorization: Enterprise Track
Session Type: Breakout Session (45 minutes)

Primary Competency: PR:Competency 1: Information Management/Records Management
Secondary Competency: DA:Competency 8: Analytics and Campaign
Tertiary Competency: CA:Competency 4: Campaign Identification and Pipeline
Intended Audience Level: Level II
Learning Objective #1: Attendees will learn how to model high-level domain problems using a computer programming language such as R
Learning Objective #2: Attendees will learn how to leverage resources such as a data warehouse and a data dictionary to design tools that respond to specific business needs.
Prerequisites: Experience coding in any language is required. Familiarity with the R programming language is helpful, as is a familiarity with relational databases and SQL.