الخميس، 6 أغسطس 2020

Show HN: Hopi – an experiment to run Python code seamlessly from node https://ift.tt/2DIBTDo

Show HN: Hopi – an experiment to run Python code seamlessly from node https://ift.tt/2XCKu1M August 7, 2020 at 04:34AM

Show HN: Dapper Query Builder Using Interpolated Strings and Fluent API https://ift.tt/2F2FflH

Show HN: Dapper Query Builder Using Interpolated Strings and Fluent API https://ift.tt/31pXvwJ August 7, 2020 at 02:14AM

Show HN: TensorBase – a modern big data analytics infrastructure in Rust https://ift.tt/3ihH0K4

Show HN: TensorBase – a modern big data analytics infrastructure in Rust https://tensorbase.io/ August 7, 2020 at 01:43AM

Show HN: Voicegain Speech-to-Text Python SDK https://ift.tt/3gDLB8I

Show HN: Voicegain Speech-to-Text Python SDK https://ift.tt/31xX97s August 6, 2020 at 10:47PM

Show HN: Chrome Extension Developer Starter Kit https://ift.tt/3gyDOZS

Show HN: Chrome Extension Developer Starter Kit https://ift.tt/2C8mJam August 6, 2020 at 07:21PM

Show HN: Founderpath – Raise $10k-$1m in 72 hours, free revenue analytics https://ift.tt/3a0aDwi

Show HN: Founderpath – Raise $10k-$1m in 72 hours, free revenue analytics Hey HN! We're launching Founderpath today to help saas founders raise cash without diluting their company Instead of a merchant cash advance needing to be paid back within 6-12 months, our standard deal is 15% interest, 4 year payback, $200k check We get no equity, no warrants, no weird covenants, and don't require personal guarantees This cash is way cheaper than if you were to raise from a VC (normally diluting your equity pool by 20%+ and losing a lot of control via a board seat and veto powers) We're around all day today showing demos & answering questions over at https://ift.tt/2DCq3Ll Hope to see you there! August 6, 2020 at 07:40PM

Show HN: Meet Transcript – Record Google Meet Caption & Screenshot to Google Doc https://ift.tt/3icm4E4

Show HN: Meet Transcript – Record Google Meet Caption & Screenshot to Google Doc https://ift.tt/3fyHwBE August 6, 2020 at 07:18PM

Show HN: A searchable list of VC jobs https://ift.tt/30vC3am

Show HN: A searchable list of VC jobs https://ift.tt/30A2fQY August 6, 2020 at 07:04PM

Show HN: Avatarz – A Library of 3D Avatars https://ift.tt/2PrTimA

Show HN: Avatarz – A Library of 3D Avatars https://ift.tt/3fzREKn August 6, 2020 at 06:54PM

Show HN: Create your WebAssembly Calling Card–visual creativity with WASM https://ift.tt/2XA8UsS

Show HN: Create your WebAssembly Calling Card–visual creativity with WASM https://ift.tt/33Aesr4 August 6, 2020 at 06:11PM

Show HN: I made a curated list of free tools for startups https://ift.tt/3gzPWK1

Show HN: I made a curated list of free tools for startups https://ift.tt/33AdFGu August 6, 2020 at 06:06PM

Show HN: CheerpX – x86 virtualization in browser using WebAssembly – Python Demo https://ift.tt/2C2PzZx

Show HN: CheerpX – x86 virtualization in browser using WebAssembly – Python Demo https://ift.tt/3ie84JZ August 6, 2020 at 06:03PM

Show HN: Auth is now available in Supabase (YC S20) https://ift.tt/2DK2lwo

Show HN: Auth is now available in Supabase (YC S20) https://ift.tt/2EV2MER August 6, 2020 at 05:50PM

Show HN: I Made a Clothing Database https://ift.tt/2PvuAlf

Show HN: I Made a Clothing Database https://ift.tt/2WDAXqN August 6, 2020 at 05:45PM

Launch HN: Datafold (YC S20) – Diff Tool for SQL Databases https://ift.tt/3fx2a56

Launch HN: Datafold (YC S20) – Diff Tool for SQL Databases Hi HN! My name is Gleb. I'm here with my co-founder Alex to tell you about our company Datafold ( https://datafold.com ). Datafold lets you diff large datasets for fast and powerful regression testing. We support databases such as PostgreSQL, Snowflake, BigQuery, and Redshift. One of the biggest pain points in developing ETL pipelines – chains of jobs that move, clean, merge and aggregate analytical data – has been regression testing: verifying how a change in source code (mostly, SQL) affects the produced data. Early in my career, as an on-call data engineer at Lyft, I accidentally introduced a breaking code change while attempting to ship a hotfix at 4AM to a SQL job that computed tables for core business analytics. A seemingly small change in filtering logic ended up corrupting data for all downstream pipelines and breaking dashboards for the entire company. Apart from being a silly mistake, this highlighted the lack of proper tooling for testing changes. If there had been a way to quickly compare the data computed by production code vs. the hotfix branch, I would have immediately spotted the alarming divergence and avoided merging the breaking change. Without a diffing tool, the typical options for regression testing are: (1) Data “unit tests” (e.g. check primary key uniqueness, ensure values are within interval, etc.) – these are helpful, but costly investment. Frameworks such as dbt make it easier, but it’s often still prohibitively hard to verify all assumptions in a large table. (2) Write custom SQL queries to compare data produced by the prod and dev versions of the source code (e.g. compare counts, match primary keys). This can easily take up 100+ lines of SQL and hours of unsatisfying work, which no one really wants to do. (3) "Fuck It, Ship It" is always an option but too risky nowadays as analytical data not only powers dashboards but also production ML models. As this problem is common in data engineering, some large organizations have built and open-sourced their solutions – for example, BigDiffy by Spotify. However, most of these tools are CLI-based and produce results in a plain-text format which is hard to comprehend when you are dealing with complex data. To fit existing workflows of our users, we’ve built a web interface with interactive charts showing both diff summary statistics (e.g. % of different values by column) and value-level side-by-side comparison (git diff style). But since the mission of the tool is to save engineers as much time as possible, we also opened an API for automation through Airflow or other orchestrators, and built a Github workflow that runs diff on every pull request with changes to ETL code. Since billion-row-scale datasets are not uncommon nowadays, there is an optional sampling feature that helps keep compute costs low and get results within a few minutes no matter how large the dataset is. We've found Datafold to be a good fit for the following workflows: (1) Developing data transformations – before an ETL job is shipped to production, it undergoes multiple iterations. Often it’s important to see how data changes between every iteration, and particularly useful if you have 1M+ rows and 100+ columns where “SELECT *” becomes useless. (2) Code review & testing: large organizations have hundreds of people committing to ETL codebases. Understanding the impact of even a modest SQL diff is daunting. Datafold can produce a data diff for every commit in minutes so changes are well understood. (3) Data transfer validation: moving large volumes of data between databases is error-prone, especially if done via change data capture (CDC): a single lost event can affect the resulting dataset in a way that is tricky to debug. We allow comparing datasets across different databases, e.g. PostgreSQL & Snowflake. We've set up a sandbox at https://ift.tt/3ieKsVx so you can see how diffing works. Shoot us an email (hn@datafold.com) to set up a trial and use it with your own data. We are passionate about improving tooling for data engineers and would love to hear about your experience with developing data pipelines and ensuring data quality. Also, if you think that dataset diffing can be helpful in other domains, we are very curious to learn from you! August 6, 2020 at 05:43PM

Show HN: Polygrid – Fluidly explore catalogs with millions of items https://ift.tt/2XziV9C

Show HN: Polygrid – Fluidly explore catalogs with millions of items https://polygrid.com/ August 6, 2020 at 05:26PM

Show HN: SameTunes – A Music Compatibility Platform for Spotify https://ift.tt/3kcLEuB

Show HN: SameTunes – A Music Compatibility Platform for Spotify https://sametunes.com/ August 6, 2020 at 03:56PM

Show HN: Tableau-like Data Visualization library (free), powered by WebAssembly https://ift.tt/3fDoQAp

Show HN: Tableau-like Data Visualization library (free), powered by WebAssembly https://muzejs.org August 6, 2020 at 02:14PM

Show HN: Best-Books.dev, the best programming books, all in one place https://ift.tt/3a0atoN

Show HN: Best-Books.dev, the best programming books, all in one place https://best-books.dev August 6, 2020 at 10:36AM

Show HN: A form to get the right international bank info (Transferwise and co) https://ift.tt/3kgLEd1

Show HN: A form to get the right international bank info (Transferwise and co) https://payday.sh/ August 6, 2020 at 09:10AM