Data Programming I - Fall 25
Main contact

Timeline
-
September 13, 2025Experience start
-
December 13, 2025Experience end
Experience scope
Categories
Data visualization Data analysis Data modelling Data scienceSkills
nosql apache hbase presentations adult education apache kafka apache spark java (programming language) scala (programming language) application programming interface (api) computer scienceThis course is part of the Big Data Programming and Analytics certificate program.
Students in the program are adult learners with a post-secondary degree/diploma in
computer science, engineering, business, etc.
This course examines developing solutions for extracting and analyzing big data sets
using various technologies. Students will learn Scala and Java, which are the
fundamental part of Spark, Kafka, and HBase. The focus will be on Apache Spark and
its different aspects. Students will explore real-time analytics tools such as Kafka and
HBase. NoSQL will be covered in this course.
Course activities will include instructor presentations, required readings and experiential
learning activities (i.e. case studies, group discussions, projects, etc.).
Learners
The final project deliverables will include:
A report on students’ findings and details of the problem presented
Future collaboration ideas will be identified based on current project outcomes
Project timeline
-
September 13, 2025Experience start
-
December 13, 2025Experience end
Project examples
The project provides an opportunity for businesses and learners to collaborate to
identify and translate a real business problem into an analytics problem.
The projects, which can be short, will allow the student to apply the skills acquired on
the various tools to address the business problem. Students also learn how to
implement real-time scenarios.
Some examples of potential projects:
- Development of search and analytics solutions
- Development of highly scalable and cost-efficient applications with MongoDB
- Building MongoDB data models for enterprise applications
- Deploy and management of Elasticsearch clusters
You should submit a high-level proposal/business problem statement including relevant
data sets and definitions, a list of acceptable tools (if applicable), and expected
deliverables. Business datasets could be provided based on a non-disclosure
agreement or in an anonymized/synthetic data format that is relevant to your
organization and business problem. The course instructors will review the documents to
confirm the scope and timing of the proposed problem and its alignment with the
capstone course requirements.
Analytics solution may be applicable for (however they are not limited to) the following
topics:
1. Demand for social services (healthcare, emergency services, infrastructure, etc.)
2. Customer acquisition and retention
3. Merchandising for trade areas (categories)
4. Quantifying Customer Lifetime Value
5. Determining media consumption (mass vs digital)
6. Cross-sell and upsell opportunities
7. Develop high propensity target markets
8. Customer segmentation (behavioral or transactional)
9. New Product/Product line development
10. Market Basket Analysis to understand which items are often purchased together
11. Ranking markets by potential revenue
12. Consumer personification
To ensure students’ learning objectives are achieved, we recommend that the datasets
are at least 20,000+ rows in size. Data need to be ‘clean’. If more than one database is
provided, which must be conjoined, students will be required to integrate them. This
supports the learning experience and minimizes partner data preparation.
Additional company criteria
Companies must answer the following questions to submit a match request to this experience:
Main contact

Timeline
-
September 13, 2025Experience start
-
December 13, 2025Experience end